外文科技文獻(xiàn)及翻譯sas統(tǒng)計分析軟件和logistic回歸_第1頁
外文科技文獻(xiàn)及翻譯sas統(tǒng)計分析軟件和logistic回歸_第2頁
外文科技文獻(xiàn)及翻譯sas統(tǒng)計分析軟件和logistic回歸_第3頁
外文科技文獻(xiàn)及翻譯sas統(tǒng)計分析軟件和logistic回歸_第4頁
外文科技文獻(xiàn)及翻譯sas統(tǒng)計分析軟件和logistic回歸_第5頁
已閱讀5頁,還剩2頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、sas統(tǒng)計分析軟件和logistic回歸1概況: sas系統(tǒng)全稱為statistics analysis system,最早由北卡羅來納大學(xué)的兩位生物統(tǒng)計學(xué)研究生編制,并于1976年成立了sas軟件研究所,正式推出了sas軟件。sas是用于決策支持的大型集成信息系統(tǒng),但該軟件系統(tǒng)最早的功能限于統(tǒng)計分析,至今,統(tǒng)計分析功能也仍是它的重要組成部分和核心功能。sas現(xiàn)在的版本為9.0版,大小約為1g。經(jīng)過多年的發(fā)展,sas已被全世界120多個國家和地區(qū)的近三萬家機(jī)構(gòu)所采用,直接用戶則超過三百萬人,遍及金融、醫(yī)藥衛(wèi)生、生產(chǎn)、運輸、通訊、政府和教育科研等領(lǐng)域。在英美等國,能熟練使用sas進(jìn)行統(tǒng)計分析是許

2、多公司和科研機(jī)構(gòu)選材的條件之一。在數(shù)據(jù)處理和統(tǒng)計分析領(lǐng)域,sas系統(tǒng)被譽為國際上的標(biāo)準(zhǔn)軟件系統(tǒng),并在9697年度被評選為建立數(shù)據(jù)庫的首選產(chǎn)品。堪稱統(tǒng)計軟件界的巨無霸。在此僅舉一例如下:在以苛刻嚴(yán)格著稱于世的美國fda新藥審批程序中,新藥試驗結(jié)果的統(tǒng)計分析規(guī)定只能用sas進(jìn)行,其他軟件的計算結(jié)果一律無效!哪怕只是簡單的均數(shù)和標(biāo)準(zhǔn)差也不行!由此可見sas的權(quán)威地位。sas系統(tǒng)是一個組合軟件系統(tǒng),它由多個功能模塊組合而成,其基本部分是base sas模塊。base sas模塊是sas系統(tǒng)的核心,承擔(dān)著主要的數(shù)據(jù)管理任務(wù),并管理用戶使用環(huán)境,進(jìn)行用戶語言的處理,調(diào)用其他sas模塊和產(chǎn)品。也就是說,sa

3、s系統(tǒng)的運行,首先必須啟動base sas模塊,它除了本身所具有數(shù)據(jù)管理、程序設(shè)計及描述統(tǒng)計計算功能以外,還是sas系統(tǒng)的中央調(diào)度室。它除可單獨存在外,也可與其他產(chǎn)品或模塊共同構(gòu)成一個完整的系統(tǒng)。各模塊的安裝及更新都可通過其安裝程序非常方便地進(jìn)行。sas系統(tǒng)具有靈活的功能擴(kuò)展接口和強(qiáng)大的功能模塊,在base sas的基礎(chǔ)上,還可以增加如下不同的模塊而增加不同的功能:sas/stat(統(tǒng)計分析模塊)、sas/graph(繪圖模塊)、sas/qc(質(zhì)量控制模塊)、sas/ets(經(jīng)濟(jì)計量學(xué)和時間序列分析模塊)、sas/or(運籌學(xué)模塊)、sas/iml(交互式矩陣程序設(shè)計語言模塊)、sas/fsp

4、(快速數(shù)據(jù)處理的交互式菜單系統(tǒng)模塊)、sas/af(交互式全屏幕軟件應(yīng)用系統(tǒng)模塊)等等。sas有一個智能型繪圖系統(tǒng),不僅能繪各種統(tǒng)計圖,還能繪出地圖。sas提供多個統(tǒng)計過程,每個過程均含有極豐富的任選項。用戶還可以通過對數(shù)據(jù)集的一連串加工,實現(xiàn)更為復(fù)雜的統(tǒng)計分析。此外,sas還提供了各類概率分析函數(shù)、分位數(shù)函數(shù)、樣本統(tǒng)計函數(shù)和隨機(jī)數(shù)生成函數(shù),使用戶能方便地實現(xiàn)特殊統(tǒng)計要求。2操作方式:sas是由大型機(jī)系統(tǒng)發(fā)展而來,其核心操作方式就是程序驅(qū)動,經(jīng)過多年的發(fā)展,現(xiàn)在已成為一套完整的計算機(jī)語言,其用戶界面也充分體現(xiàn)了這一特點:它采用mdi(多文檔界面),用戶在pgm視窗中輸入程序,分析結(jié)果以文本的形

5、式在output視窗中輸出。使用程序方式,用戶可以完成所有需要做的工作,包括統(tǒng)計分析、預(yù)測、建模和模擬抽樣等。但是,這使得初學(xué)者在使用sas時必須要學(xué)習(xí)sas語言,入門比較困難。 sas的windows版本根據(jù)不同的用戶群開發(fā)了幾種圖形操作界面,這些圖形操作界面各有特點,使用時非常方便。但是由于國內(nèi)介紹他們的文獻(xiàn)不多,并且也不是sas推廣的重點,因此還不為絕大多數(shù)人所了解。3sas系統(tǒng)基本操作及基本概念 :3.1數(shù)據(jù)集(dataset)和庫 :統(tǒng)計學(xué)的操作都是針對數(shù)據(jù)的,sas中容納數(shù)據(jù)的文件稱為數(shù)據(jù)集,數(shù)據(jù)集又包含在不同的庫(暫且理解為數(shù)據(jù)庫吧)中。sas中的庫分為永久性

6、和臨時性兩種。顧名思義,存在于永久庫中的數(shù)據(jù)集是永久存在的(只要你不去刪除它),臨時庫中的數(shù)據(jù)集則在你退出sas后自動被刪除。至于sas中庫的概念,最簡單的理解就是一個目錄,一個存放數(shù)據(jù)集的目錄。 數(shù)據(jù)集的結(jié)構(gòu)完全等同于我們一般所理解的數(shù)據(jù)表,由字段和記錄所構(gòu)成,在統(tǒng)計學(xué)中我們習(xí)慣將字段稱為變量,在后面的內(nèi)容中字段和變量我們就理解為同一種東西吧!建立數(shù)據(jù)集的方法很多,編程操作中有專門的數(shù)據(jù)讀入方法來建立數(shù)據(jù)集,但需要將數(shù)據(jù)現(xiàn)場錄入,費時費力。如果數(shù)據(jù)量大,我勸各位還是先以其它方法將數(shù)據(jù)集建好,否則程序語句的絕大部分會浪費在數(shù)據(jù)的輸入上。3.2  sas程序概述 :

7、和其它計算機(jī)語言一樣,sas語言(稱為scl語言,sas component language)也有其專有的詞匯(即關(guān)鍵字)和語法。關(guān)鍵字、名字、特殊字符和運算符等按照語法規(guī)則排列組成sas語句,而執(zhí)行完整功能的若干個sas語句就構(gòu)成了sas程序。 sas程序包括多個步驟和一些控制語句,一般情況下均包括數(shù)據(jù)步和過程步,一個或多個、數(shù)據(jù)步或過程步,它們之間任何形式的組合均可成為一段sas程序,只要能完成一個完整的功能。通常情況下sas程序還包括一些全程語句,用以控制貫穿整個sas程序的某些選項、變量或程序運行的環(huán)境。  sas程序的語句一般以關(guān)鍵字開始,以

8、一個分號結(jié)束,一條語句可占多行(sas每看到一個分號,就將其以前、上一個分號以后的所有東東當(dāng)作一條語句來處理,而不管他們處在多少個不同的行中)。sas語句對字母的大小寫不敏感,你可以根據(jù)個人習(xí)慣決定字母的大寫或小寫。 4. logistic回歸:logistic回歸是一類統(tǒng)計模型稱為廣義線性模型。這一模型包括單一回歸,包括普通的回歸和方差分析,以及多元統(tǒng)計等變數(shù)和對數(shù)線性回歸。一個很好使用線性模型的例子為萊斯蒂。logistic回歸允許一個預(yù)測離散成果,如組成員,來自于一組變量,可能是連續(xù)的,離散的,二分,或混合任何這些。一般情況下,因變量是二分變量,如在場/缺席或成功/失敗。判別分

9、析是用來預(yù)測組成員只有兩個群體。然而,判別分析只能用連續(xù)獨立變量。因此,在獨立的變量是一個絕對的,或混合的連續(xù)和明確情況,logistic回歸是首選。4.1 模型:因變量的logistic回歸通常是二分變量,就是因變量值為1是事件發(fā)生,值為0是事件不發(fā)生。這種類型的變量被稱為伯努利(或二元)變量。雖然不是常見的,也不是在事件中討論,應(yīng)用logistic回歸也已擴(kuò)大到情況下,因變量是兩個以上的情況下,這種情況被稱為多項式或多級 tabachnick和費德爾( 1996年)使用的術(shù)語polychotomous 。 如前所述,獨立的或預(yù)測變量logistic回歸可以采取任何形式。也就是說, logi

10、stic回歸是不作任何假設(shè)的分布的獨立變量。他們不必正態(tài)分布,線性關(guān)系或平等的差額在每個組之間的關(guān)系,預(yù)測和因變量不是一個線性函數(shù)的logistic回歸,代替他的是,logistic回歸函數(shù)的使用是對數(shù)函數(shù)的變換:這里=截距項,=自變量的預(yù)測系數(shù)。 另一種形式的logistic回歸方程為:logistic回歸的目的是正確預(yù)測出一個模型,這個模型適用與大哥事件發(fā)生概率的預(yù)測。為了實現(xiàn)這一目標(biāo),建立一個模型,這個模型包括一個因變量和多個自變量,多個自變量被用于預(yù)測因變量的結(jié)果。在模型建立過程中幾個不同的選擇被利用。變量在指定的順序可進(jìn)入模型由研究員或logistic回歸可以測試適合的模式后,每一個

11、系數(shù)為增加或刪除,呼吁逐步回歸。逐步回歸被使用在研究探索階段,但我們不建議用于理論測試(梅納爾1995年) 。理論測試是測試各個變量之間關(guān)系的變數(shù)。探索性測試是測試給定觀測值各個變量之間的關(guān)系,因此,逐步回歸的目標(biāo)是發(fā)現(xiàn)因變量與各個自變量之間的關(guān)系。 向后逐步回歸似乎是首選方法探索分析,在分析,首先是全部或飽和模型和變量排除在模型中的一個反復(fù)的過程。合適的模型進(jìn)行測試后,消除每個變量,以確保該模型仍能充分符合數(shù)據(jù).當(dāng)沒有變量可以從模型中刪除時,整個統(tǒng)計分析工作就完成了。這里是logistic回歸的兩種主要用途。首先是預(yù)測組成員。由于logistic回歸計算概率或失敗之上的概率,分析結(jié)果是以優(yōu)勢

12、率形式進(jìn)行的。例如, logistic回歸經(jīng)常被用于流行病學(xué)研究,分析結(jié)果是在控制其他的風(fēng)險因素前提下啦預(yù)測癌癥的發(fā)病率。 logistic回歸還提供了變量之間關(guān)系的只是(例如,吸10包煙癌癥的發(fā)病率將高于你在棉礦中工作的癌癥發(fā)病率)。這個過程,系數(shù)測試幾個不同的技術(shù),所有這些將在下文討論。4.2 wald檢驗: wald檢驗是用來測試的統(tǒng)計意義的每一個自變量的系數(shù)( b)在該模型中是否是為0。wald檢驗計算的z是通過以下的公式得出的:z值再平方,產(chǎn)生了瓦爾德統(tǒng)計與卡方分布。然而,一些作者已查明了使用wald檢驗的缺陷。梅納( 1995 )警告說,系數(shù)不變,標(biāo)準(zhǔn)誤差增大,降低了wald統(tǒng)計值

13、。萊斯蒂指出,最大似然度對于大規(guī)模樣本要比使用wald測試更有效。 4.3 最大似然度檢驗: 最大似然使用的比例,以最大化的價值,似然函數(shù)為充分模型(l1)的最大化價值的似然函數(shù)的簡單的模型( l0 ) 。的似然比檢驗統(tǒng)計量等于:這個記錄的可能性轉(zhuǎn)變職能產(chǎn)生的卡方統(tǒng)計。這是推薦的檢驗統(tǒng)計時使用的模式,通過建設(shè)落后的逐步消除。 4.4 霍斯默- lemshow擬合優(yōu)度檢驗: 該霍斯默- lemshow統(tǒng)計評估擬合優(yōu)度,創(chuàng)造10命令群體的主題,然后比較實際的人數(shù)在各組(觀察)的數(shù)量預(yù)測的logistic回歸模型(預(yù)測) 。因此,檢驗統(tǒng)計量是卡方統(tǒng)計與理想的結(jié)果非意義,這表明該模型預(yù)測并沒有顯著不同

14、的觀察。 排列的10個團(tuán)體的基礎(chǔ)上創(chuàng)建自己的估計概率;那些估計概率低于0.1形成一組,依此類推,直至與概率0.9至1.0 。每一類又分為兩組,根據(jù)實際觀察到的結(jié)果變量(成功,失?。?。預(yù)期的頻率為每一個細(xì)胞都得到model.if模式是好的,那么大多數(shù)的主題成功屬于較高風(fēng)險和那些失敗的風(fēng)險較低??萍纪馕奈墨I(xiàn)sas statistical analysis software and logistic regressioni. overview: sas is called the statistics analysis system, the first from the university o

15、f north carolina's two post-graduate preparation of biostatistics, and in 1976 the institute of sas software is established e, the formal sas software launched. sas is a large-scale decision support for integrated information systems, but the software system functions limited to the first statis

16、tical analysis, since the statistical analysis is still an important part of its core functionality. the current sas version is 9.0 version, the size is about 1g. after years of development, sas has been around more than 120 countries and regions, nearly 30,000 institutions that have a direct users

17、over three million people, across the financial, medical and health, production, transport, communications, government and education and scientific research. in britain and the united states and other countries, skilled using sas for statistical analysis is the conditions for many companies and rese

18、arch institutions selection. in data processing and statistical analysis, sas system known as the international standard software systems, and in 96 97 years has been selected as the first choice for the establishment of a database product. sas is called the big mac statistical software sector. the

19、other example of this is as follows: in a harsh strict world-famous u.s. fda drug approval process, the statistical analysis of the drug test results is carried out sas and other software will be voided! even a simple and standard deviation are void! this shows the authority of the sas.sas is a comb

20、ination of sas software system, which is a combination of multiple functional modules, the basic part of base sas module. base sas module is the core of the sas system,which assume the main task of data management and user management environment for the conduct of the user of language processing, ca

21、ll the other sas modules and products. in other words, sas systems, we start the base sas module, which in addition has its own data management, programming and computing descriptive statistics, the sas system or the central dispatching room. it can stand alone, but also with other products or modul

22、es together form a complete system. each module can be installed and updated through the installation process very easy. sas system has a flexible interface and powerful extension of the functional modules in the basis of base sas, you can add the following different modules and a variety of new fea

23、tures: sas / stat (statistical analysis module), sas / graph (graphics module) , sas / qc (quality control module), sas / ets (econometric and time series analysis module), sas / or (operations research module), sas / iml (interactive matrix programming language module), sas / fsp ( fast data-proces

24、sing module of the interactive menu system), sas / af (interactive full-screen application system software modules) and so on. sas has a intelligent drawing system, it not only painted a variety of charts, but also draw the map. sas provides a wide range of statistical process, each process contains

25、 a great deal of any option. users can set a series of data processing to realize more complex statistical analysis. in addition, sas also offers a variety of probability analysis function, quantile function, the sample statistics functions and random number generator function, so that users can req

26、uest easily special statistics.2. operation sas was developed from the mainframe system, the core operation is the process-driven, after many years of development, sas has now become a complete set of computer language, and its user interface is also fully embodied the characteristics: it uses mdi (

27、multiple document interface), the user input program in the pgm window, the results of the analysis in the form of text output in the output window. using the program, users can complete all the work, including statistical analysis, forecasting, modeling and simulation, sampling and so on. however,

28、this makes the beginners to learn sas language, entry is more difficult. the windows sas version accord to different user groups to develop a number of graphical user interface, graphical user interface of these different characteristics, use very convenient. however, due to limit, and not to promot

29、e the focus of sas, so the vast majority of people do not understand.3.the basic operation and basic concepts of sas 3.1 dataset (dataset) and the database statistics are for the operation of the data, files which is filled with sas data is named dataset. in the capacity as the data sets, data sets

30、also included in different library (for the time being it understood as a database). sas in the library is divided into two types of permanent and temporary. as the name suggests, the existence of a permanent library in the data set is permanent (as long as you do not delete it), temporary library i

31、n the data sets from the sas you automatically be deleted. as for the concept of sas in the database, the simplest to understand is a directory, a directory of stored data sets. the structure of a data set exactly the same as our normal understanding of data tables, fields and records by the composi

32、tion, in the statistical field, we used to be known as the variable content in the back of the field and we understand the variables for the same kinds of things now! the establishment of a data set of the many ways in the programming operation of the data read into the specialized approach to the e

33、stablishment of a data set, but the scene needs to be data entry, time-consuming and laborious. if the amount of data, and i advise you or to other methods to data sets will be completed, otherwise the process will be a waste of the vast majority of statements in the input data3.2 sas language and o

34、ther computer languages, sas language (known as the scl language, sas component language) also has its proprietary terms (ie keywords) and grammar. keywords, names, special characters and operators, such as the composition in accordance with the grammar rules with sas statements, and the implementat

35、ion of the full functionality of a number of sas statements constitute the sas procedure. sas procedures, including a number of steps and a number of control statements, the general case, including data and process step-by-step step-by-step, one or more, the data step-by-step or step-by-step process

36、, in any form between them may become a section of a combination of sas procedures, as long as they can be completed a complete function. sas procedures usually include a number of the whole statement, to control procedures throughout the sas some options, variable or program environment. sas proced

37、ures begins keyword and ends semicolon, a statement can be accounted multi-line (sas see a semicolon, it will be the past, after a semicolon sas will take a statement to process, regardless of their number in different lines). sas statements on the case insensitive letters, you may decide according

38、to personal habits of the upper or lowercase letters.4.logistic regression logistic regression is part of a category of statistical models called generalized linear models. this broad class of models includes ordinary regression and anova, as well as multivariate statistics such as ancova and l

39、oglinear regression. an excellent treatment of generalized linear models is presented in agresti (1996). logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these. generally,

40、 the dependent or response variable is dichotomous, such as presence/absence or success/failure. discriminant analysis is also used to predict group membership with only two groups. however, discriminant analysis can only be used with continuous independent variables. thus, in instances where the in

41、dependent variables are a categorical, or a mix of continuous and categorical, logistic regression is preferred. 4.1 the model:   the dependent variable in logistic regression is usually dichotomous, that is, the dependent variable can take the value 1 with a probability of success q, or t

42、he value 0 with probability of failure 1-q. this type of variable is called a bernoulli (or binary) variable. although not as common and not discussed in this treatment, applications of logistic regression have also been extended to cases where the dependent variable is of more than two cases, known

43、 as multinomial or polytomous tabachnick and fidell (1996) use the term polychotomous.    as mentioned previously, the independent or predictor variables in logistic regression can take any form. that is, logistic regression makes no assumption about the distribution of the independent var

44、iables. they do not have to be normally distributed, linearly related or of equal variance within each group.the relationship between the predictor and response variables is not a linear function in logistic regression, instead, the logistic regression function is used, which is the logit transforma

45、tion of q:   where a = the constant of the equation and, b = the coefficient of the predictor variables. an alternative form of the logistic regression equation is: the goal of logistic regression is to correctly predict the category of outcome for individual cases using the

46、 most parsimonious model. to accomplish this goal, a model is created that includes all predictor variables that are useful in predicting the response variable. several different options are available during model creation. variables can be entered into the model in the order specified by the resear

47、cher or logistic regression can test the fit of the model after each coefficient is added or deleted, called stepwise regression.   stepwise regression is used in the exploratory phase of research but it is not recommended for theory testing (menard 1995). theory testing is the testing of a-pri

48、ori theories or hypotheses of the relationships between variables. exploratory testing makes no a-priori assumptions regarding the relationships between the variables, thus the goal is to discover relationships.   backward stepwise regression appears to be the preferred method of exploratory an

49、alyses, where the analysis begins with a full or saturated model and variables are eliminated from the model in an iterative process. the fit of the model is tested after the elimination of each variable to ensure that the model still adequately fits the data.when no more variables can be eliminated

50、 from the model, the analysis has been completed.   there are two main uses of logistic regression. the first is the prediction of group membership. since logistic regression calculates the probability or success over the probability of failure, the results of the analysis are in the form of an

51、 odds ratio. for example, logistic regression is often used in epidemiological studies where the result of the analysis is the probability of developing cancer after controlling for other associated risks. logistic regression also provides knowledge of the relationships and strengths among the varia

52、bles (e.g., smoking 10 packs a day puts you at a higher risk for developing cancer than working in an asbestos mine). the process by which coefficients are tested for significance for inclusion or elimination from the model involves several different techniques. each of these will be discussed below

53、.   4.2 wald test: a wald test is used to test the statistical significance of each coefficient (b) in the model. a wald test calculates a z statistic, which is:  this z value is then squared, yielding a wald statistic with a chi-square distribution. however, several authors have identified problems with the use

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論