




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、Chapter 10 lexical analyzer (lex)Speaker: Lung-Sheng ChienReference book: John R. Levine, lex & yacc 中譯本, 林偉豪譯Reference ppt: Lecture 2: Lexical Analysis, CS 440/540, George Mason universityReference URL: http:/ Online manual: http:/ OutLine What is lex Regular expression Finite state machine Con
2、tent of flex Application Recall Exercise 7 in the midtermQuestion: can we write more compact code to obtain integers?for each line in a file if line contains “/” not in a string, then remove remaining characters after “/”. if line contains “/*” not in a string, then find conjugate pair “*/” and remo
3、ve all characters in betweenendforExercise 7: remove comments in a filein C-language, comment is delimited by a pair of /* and */ whereas in C+, comment starts from /. write a program to remove all comments of a given file. You can show result in screen or to another file.Pseudo-codeQuestion: can we
4、 have other tool to identify C-comment ?What is lexLex is a program generator designed for lexical (語彙的) processing of character input streams. It accepts a high-level, problem oriented specification for character string matching, and produces a program in a general purpose language which recognizes
5、 regular expressions (正規(guī)表示法).The regular expressions are specified by the user in the source specifications given to Lex.Lex generates a deterministic finite automaton (DFA, 有限自動機) from the regular expressions in the source.The Lex written code recognizes these expressions in an input stream and par
6、titions the input stream into strings matching the expressions.From http:/ definitionTokenPattern Lexeme(詞彙 ) integer(0-9)+234identifiera-zA-Z?a-zA-Z0-9*x1stringCharacters between “ “hello world”Token: set of strings defining an atomic element with a defined meaningPattern: a rule describing a set o
7、f stringLexeme: a sequence of characters that match some patternLexical analyzer Syntax analyzer (文法分析)Semantic analyzer (語意分析)Intermediate code generatorCode optimizerCode generatorSource codemachine codePhases of a CompilerLex is a crucial tool to extract token tokenInput fileScanneryylex()parsery
8、yparse()symbol tableask next charactercharacterask next tokentokenInput fileScanneryylex()ask next charactercharacterFile processor ofLinear programmingask next tokentokenRole of scanner: find tokenflexLex specificationlex.yy.clex.yy.o + source filegcc -cg+a.outinputtokenflex : lexical analyzer gene
9、rator C-code lex.yy.c is kernel to extract token, one just need to call function yylex(). To use lex.yy.c in different platforms, we need to solve several technical problems - dont use library - dont include specific header file- mix C with C+ code flex in RedHat 9Link with library libfl.acount_line
10、.txtLibrary libfl.a Generate source C-code lex.yy.c按 Ctrl+DThisisabooknbyebyen12345678910 1112 13 14 15Example in the manual of Flex Count number of lines and number of characters按 enterdefinition section%rule section%user codegrammar of input fileUser codeLex copy data enclosed by % and % into C so
11、urce filepattern actionn +num_lines ; + num_chars ; pattern actionWhen pattern is matched, then execute actionGrammar of input file of Flex 1 . + num_chars ; wild card character, represent any character expect line feed ndefault mainlex.yy.cGrammar of input file of Flex 2 Q1: can we compile lex.yy.c
12、 without lfl ? 1 Library libfl.a contains function yywrap() -lfl means “include library libfl.a”, this library locates in /usr/libcontains function yywrap() We want to use lex.yy.c on different platforms (Linux and windows), to avoid specific library is lesson one.Q1: can we compile lex.yy.c without
13、 lfl ? 2 count_line.txtImplement function yywrap explicitlyQ2: how to process a file? count_line.txtlex.yy.cyyin is a file pointer in lex, function yylex() read characters from yyinQ3: can we move function main to another file? count_line.txtmain.cppcode blockExercise: mix C-code with C+ codeIn this
14、 work, lex.yy.c is C-code and main.cpp is C+-code, what happens if we issue command “g+ main.cpp lex.yy.c”? Thats why we use two steps,step 1: gcc c lex.yy.cstep 2: g+ main.cpp lex.yy.oIf we replace extern C int yylex( void ) ; with int yylex( void ) ;Does “g+ main.cpp lex.yy.c” work?Q4: can we comp
15、ile lex.yy.c in VC6.0? 1 Download lex.yy.c and main.cpp in Q3 into local machine Error occurs when compiling lex.yy.c VC does not have this header fileQ4: can we compile lex.yy.c in VC6.0? 2 /usr/include/unistd.hQ4: can we compile lex.yy.c in VC6.0? 3 disable “unistd.h” in VC6.0 Error occurs since p
16、rototype of function isatty is declared in unistd.h /usr/include/unistd.hQ4: can we compile lex.yy.c in VC6.0? 4 main.cpplex.yy.cOutLine What is lex Regular expression Finite state machine Content of flex Application Regular expressionA regular expression, often called a pattern, is an expression th
17、at describes a set of strings. The origins of regular expressions lie in automata theory and formal language theory, both of which are part of theoretical computer science . In the 1950s, mathematician Stephen Cole Kleene described these models using his mathematical notation called regular sets.Mos
18、t formalisms provide the following operations to construct regular expressions- alternation: A vertical bar separates alternatives. For example, gray|grey can match “gray” or “grey”. - grouping: use parentheses to define the scope and precedence of the operators. For example, gray|grey and gr(a|e)y
19、are equivalent.- quantification (量化): a quantifier after a token (such as a character) or group specifies how often that preceding element is allowed to occur. From /wiki/Regular_expression Syntax of regular expression 1metasequencedescription.matches any single character excep
20、t newline matches a single character that is contained within the brackets. abc = a, b, c 0-9 = 0,1,2,3,4,5,6,7,8,9 matches a single character that is not contained within the brackets. abc = x is a character : x is not a or b or c matches the starting position within the string$matches the ending p
21、osition of the string or the position just before a string-ending newlinem,nmatches the preceding element at least m and not more than n times. a3,5 matches only “aaa”, “aaaa” and “aaaaa”, NOT “aa”在方括號中如果放的是名稱, 且放在樣式開頭的話, 代表這個樣式只用在某個開始狀態(tài)Syntax of regular expression 2metasequencedescription*matches t
22、he preceding element zero or more timesab*c matches “ac”, “abc”, “abbc”+matches the preceding element one or more times0-9+ matches “1”, “14”, “983”?matches the preceding element zero or one time0-9? matches “ ”, “9”|the choice (aka alternation or set union) operator matches either the expression be
23、fore or the expression after the operator.abc|def matches “abc” or “def” ( )group to be a new expression(01) denotes string “01” escape character* means wild card, * means ASCII code of * “”代表引號中的全部字元, 所有引號中的後設字元都失去它們特別的意義, 除 之外“/*” 代表兩個字元 / 和 *Example: based-10 integer one digit of regular expressi
24、on0-9positive integer is composed of many digits 0-9+0-9* is not adequate, since 0-9* can accept empty stringwe need a sign to represent all integers-?0-9+Accepted string: “-5”, “1234”, “0000”, “-000”, “9276000”Question: How to represent based-16 integer under regular expression?OutLine What is lex
25、Regular expression Finite state machine Content of flex Application Finite state machine (FSM) -?0-9+S0minusdigit-0-90-90-9Current stateInput token (transition function)Next statedescriptionS0-minusS0 is initial state0-9digitminus0-9digitminus state recognize string “-”digit0-9digitdigit state recog
26、nize string “-0-9+” or “0-9+”trapterminateintegertrap0-9-0-9state transition diagram- 1 2 3 4S0minusdigit-State sequence - 1 2 3 4S0minus-1digit- 1 2 3 4S0minus-1digit2digit- 1 2 3 4S0minus-1digit2digit3digit- 1 2 3 4S0minus-1digit2digit3digit4Transform FSM to C-code S0minusdigit-0-90-90-9trap0-9-0-
27、911223344556677test.txtDriver to yylex_integer main.cppExercise: extract real number -?0-9*.0-9+(Ee-+?0-9+)?)real numberwhy do we need a escape character for dot, “.” ?Can this regular expression identify all real numbers?depict state transition diagram of finite state machine for this regular expre
28、ssion. Implement this state transition diagram and write a driver to test itUse flex to identify (1) integer (2) real number, note that you need to neglect space character tn OutLine What is lex Regular expression Finite state machine Content of flex Application How flex worksflex works by processin
29、g the file one character at a time, trying to match a string starting from that character1. flex always attempts to match the longest possible string2. if two rules are matched (and match strings are same length), the first rule in the specification is used.Once it matches a string, it starts from t
30、he character after the string.Once a rule is matched, flex execute corresponding action, if no “return” is executed, then flex automatically matches next token. flex always creates a file named “l(fā)ex.yy.c” with a function yylex().The flex library supplies a default “main”: main(int argc, char* argv)
31、return yylex() ; However we prefer to write our “main”.Lex states Regular expressions are compiled to finite state machine flex allows the user to explicitly declare multiple states%x CMNT /exclusive starting condition%s STRING /inclusive starting condition Default initial state is INITIAL (0) Actio
32、ns for matched strings may be different for different stateyylex()當 token 配對到樣式後, 會執(zhí)行一段 C 語言程式碼, 然後藉由 return 會讓 yylex() 傳回一個傳回值給呼叫程式. 等到下次再呼叫 yylex() 時, 字彙分析器就從上次停下來的地方繼續(xù)做下去yylex() return 0 when encounters EOF. count_line.txtmain.cppcall yylex() till End-Of-Filereturn to caller when matching a token
33、yytext當字彙分析器辨識出一個 token 之後, token 的文字會存在 yytext 字串中, 且以空字元 (null, 0) 結尾. 且 token 的長度記錄在 yyleng, 即 yyleng = strlen(yytext)yytext 是字元陣列, 宣告為extern char yytext ; 或 extern char *yytext ; yytext 的內容在每辨識出一個新的 token 之後, 就會被更新. 假如之後想用到 yytext 的內容, 請自行複製因為 yytext 是陣列型態(tài), 比 yytext 還長的 token 將導致 overflow. 在 fle
34、x 中, 預設的 I/O 暫存區(qū)是 16KB, 所以可以處理 8KB 的 token. 即便 token 是一段注解是不會產(chǎn)生 overflow 的問題lex.yy.cyywrap()當字彙分析器讀到檔案結尾時, 它會呼叫 yywrap() 函式來看看接下來要做什麼. 假如 yywrap() 函式傳回 0, 則字彙分析器繼續(xù)作分析 ;假如 yywrap() 函式傳回 1,則字彙分析器傳回一個 token 0 來代表遇到檔案結尾在 lex 函式庫中的標準 yywrap() 函式永遠會傳回 1, 但是你可以用自己寫的來代替它.假如 yywrap() 函式傳回 0, 表示還有其它的輸入資料, 這個時
35、候需要先重新設定 yyin 指向新的檔案 (用 fopen 來設定)在我們的 lex 輸入檔中, 我們定義 yywrap() 永遠回傳 1, 表示只有一個檔案需要處理count_line.txtyyinput(), yyunput()flex 提供 yyinput() 以及 yyunput() 來包裝 input(), unput().unput(c) 函式會將字元 c 放回輸入資料中. 和一般 stdio 中 unputc()函式不同的是: 你可以連續(xù)呼叫 unput() 來將一堆字元放回去.lex.yy.cyyless(), yymore()在動作程式碼中呼叫 yyless(n), 會將該
36、規(guī)則配對到的 token 保留前 n 個字元, 其它的則 “放” 回去. 在判斷 token 的邊界時, 而且又不容易表示成常規(guī)表示法時很有用. yyless 和 yymore 可搭配使用, 利用 yymore 來告訴 lex 將下一個 token 附加到目前的 token 上extract string literal“abc”mac”“abc”?* 傳回最後一個引號加入下一個字串* “ a b c ” m a c ”input bufferregular expression yytext“* “ a b c ” m a c ”“a* “ a b c ” m a c ”“ab* “ a b
37、 c ” m a c ”“abc* “ a b c ” m a c ”“abcAnalyzing process 1input bufferregular expression yytextAnalyzing process 2* “ a b c ” m a c ”“abc“yyleng = 6“ a b c ” m a c ”“ a b c ” m a c ”* “abc“abc“ a b c ” m a c ”* “abc“m“ a b c ” m a c ”* “abc“maunput character ” input bufferregular expression yytextAn
38、alyzing process 3“ a b c ” m a c ”* “abc“mac“ a b c ” m a c ”* “abc“mac“fails“abc“mac“yytext0yyleng = 10Starting condition (開始狀態(tài))flex provides a mechanism for conditionally activating rules. Any rule whose pattern is prefixed with will only be active when the scanner is in the start condition named
39、sc. Start conditions are declared in the definitions (first) section of the input using unindented lines beginning with either %s (inclusive start conditions) or %x (exclusive start conditions)Initial starting condition of flex is 0 (INITIAL)A start condition is activated using the BEGIN action. Unt
40、il the next BEGIN action is executed, rules with the given start condition will be active and rules with other start conditions will be inactive.If the start condition is inclusive, then rules with no start conditions at all will also be active.If it is exclusive, then only rules qualified with the
41、start condition will be active. %s example % foo do_something(); bar something_else(); %x example % foo do_something(); bar something_else(); pattern foo is activated in starting condition, examplepattern bar does not specify starting conditions, then all starting conditions declared as inclusive (s
42、) will execute pattern bar%s example % foo do_something(); bar something_else(); The following three lex input are equivalentInclusive v.s. exclusiveHow to recognize comment in C, /* */ comment.txtmain.cpptest.txtCMNT is an exclusive starting conditionIf read /*, change to CMNT If read */, back to I
43、NTIAL Can you explain output?ExerciseC+ support another kind of comment, starting by /, write a regular expression to recognize this kind of comment and build it into flex input file. Write a C program with C-comment and C+-comment to test scanner generated by flex. Depict state transition diagram f
44、or C-comment and C+ comment, write code to implement this state transition diagram and measure program size. Do you think flex helps you identify C-comment very well?Can you have other method to identify C-comment by using flex? Hint: use flex to identify /*, then write code to find */ by yyinput() or input()comment.txtOutLine What is lex Regular expression Finite state machine Content of flex Application- scan configuration file of linear programming- C-program analyzer minsubject to ,0Tzc xAxb xconfigure.txtApplication 1: configuration file of Linear Programmingx1x2x4x5int
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 保管采購合同范例
- 眾籌開店合同范例
- 別墅裝飾裝修合同范例
- Z銀行X支行績效管理優(yōu)化研究
- PP-PPOH共混鋰離子電池隔膜的制備及其性能研究
- 借款續(xù)簽合同范例
- 會議物料合同范例
- 債權出質擔保合同范例
- 健身房合同范例
- 農(nóng)村拆除合同范例
- 核心素養(yǎng)視域下的小學英語“教學評一體化”實踐研究
- 2025年南昌理工學院單招職業(yè)技能測試題庫審定版
- 2025年湖南高速鐵路職業(yè)技術學院單招職業(yè)適應性測試題庫帶答案
- 2025年黃山職業(yè)技術學院單招職業(yè)傾向性測試題庫及參考答案
- 學校食堂食材采購合同范本
- 冷庫安全培訓
- 2025年內蒙古法院系統(tǒng)招聘用制書記員2988人過渡高頻重點模擬試卷提升(共500題附帶答案詳解)
- 自媒體運營實戰(zhàn)教程(抖音版) 課件 第7、8章 短視頻運營;直播運營
- 2025年陜西西安康本材料有限公司招聘筆試參考題庫含答案解析
- 音頻內容創(chuàng)新策略-洞察分析
- 2024年陜西財經(jīng)職業(yè)技術學院高職單招職業(yè)技能測驗歷年參考題庫(頻考版)含答案解析
評論
0/150
提交評論