Abstract Interpretation with Applications to Timing Validation_第1頁
Abstract Interpretation with Applications to Timing Validation_第2頁
Abstract Interpretation with Applications to Timing Validation_第3頁
Abstract Interpretation with Applications to Timing Validation_第4頁
Abstract Interpretation with Applications to Timing Validation_第5頁
已閱讀5頁,還剩113頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、Timing Analysis- timing guarantees for hard real-time systems-Reinhard WilhelmSaarland UniversitySaarbrckenTexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAStructure of the LectureIntroductionStatic timing analysisthe problemour approachthe successtool architect

2、ureCache analysisPipeline analysisValue analysisWorst-case path determinationTiming Predictabilitycachesnon-cache-like devicesfuture architecturesConclusionIndustrial NeedsHard real-time systems, often in safety-critical applications aboundAeronautics, automotive, train industries, manufacturing con

3、trol Wing vibration of airplane, sensing every 5 mSecSideairbag in car,Reaction in 10 mSeccrankshaft-synchronous taskshave very tight deadlines, 45uSHard Real-Time SystemsEmbedded controllers are expected to finish their tasks reliably within time bounds.Task scheduling must be performedEssential: u

4、pper bound on the execution times of all tasks statically known Commonly called the Worst-Case Execution Time (WCET)Analogously, Best-Case Execution Time (BCET)Timing Analysis Embedded controllers are expected to finish their tasks reliably within time bounds. The problem:Givena software to produce

5、some reaction, a hardware platform, on which to execute the software,required reaction time.Derive: a guarantee for timeliness.Timing Analysisprovides parameters for schedulability analysis:Execution time, Ci, of tasks, and if that is impossible,upper bounds and maybe also lower bounds on execution

6、times of tasks, often called Worst-Case Execution Times (WCET) and Best-Case Execution Times (BCET).Architecture(constant executiontimes)Timing Analysis the Search Spaceall control-flow paths (through the binary executable) depending on the possible inputs.Feasible as search for a longest path if it

7、eration and recursion are bounded,execution time of instructions are (positive) constants.Elegant method: Timing Schemata (Shaw 89) inductive calculation of upper bounds.SoftwareInputub (if b then S1 else S2) := ub (b) + max (ub (S1), ub (S2)High-Performance Microprosessorsincrease (average-case) pe

8、rformance by using: Caches, Pipelines, Branch Prediction, SpeculationThese features make timing analysis difficult:Execution times of instructions vary widelyBest case - everything goes smoothly: no cache miss, operands ready, resources free, branch correctly predictedWorst case - everything goes wr

9、ong: all loads miss the cache, resources are occupied, operands not readySpan may be several hundred cyclesVariability of Execution TimesLOAD r2, _aLOAD r1, _bADD r3,r2,r1PPC 755x = a + b;In most cases, executionwill be fast.So, assuming the worst caseis safe, but very pessimistic!AbsInts WCET Analy

10、zer aiT IST Project DAEDALUS final review report: The AbsInt tool is probably thebest of its kind in the world and it is justified to consider this result as a breakthrough.”Several time-critical subsystems of the Airbus A380 have been certified using aiT;aiT is the only validated tool for these app

11、lications.Tremendous Progressduring the 10 years from 1998 to 2008199520022005over-estimation20-30%15%30-50%42560200cache-miss penaltyLim et al.Thesing et al.Souyris et al.The explosion of penalties has been compensated by the improvement of the analyses!10%25%State-dependent Execution TimesExecutio

12、n time depend on the execution state.Execution state results from the execution history.semantics state:values of variablesexecution state:occupancy of resourcesstateArchitectureTiming Analysis the Search Spacewith State-dependent Execution Timesall control-flow paths depending on the possible input

13、sall paths through the architecture for potential initial statesSoftwareInputinitialstatemul rD, rA, rB execution states for paths reaching this program pointinstructionin I-cacheinstructionnot in I-cache1bus occupiedbus not occupiedsmall operandslarge operands14 40ArchitectureTiming Analysis the Se

14、arch Spacewith out-of-order executionall control-flow paths depending on the possible inputsall paths through the architecture for potential initial statesincluding different schedules for instruction sequencesSoftwareInputinitialstateArchitectureTiming Analysis the Search Spacewith multi-threadinga

15、ll control-flow paths depending on the possible inputsall paths through the architecture for potential initial statesincluding different schedules for instruction sequencesincluding different interleavings of accesses to shared resourcesSoftwareInputinitialstateWhy Exhaustive Exploration?Naive attem

16、pt: follow local worst-case transitions onlyUnsound in the presence of Timing Anomalies:A path starting with a local worst case may have a lower overall execution time,Ex.: a cache miss preventing a branch mis-predictionCaused by the interference between processor components: Ex.: cache hit/miss inf

17、luences branch prediction; branch prediction causes prefetching; prefetching pollutes the I-cache. State Space Explosion in Timing Analysisconstantexecutiontimesstate-dependentexecution timesout-of-orderexecutionpreemptiveschedulingconcurrency +shared resourcesyears +methods199520002010Timing schema

18、taStatic analysis?Caches, pipelines,speculation:combined cache andpipeline analysisSuperscalar processors:interleavingsof all schedulesMulti-core withshared resources:interleavingsof several threadsNotions in Timing AnalysisHard or impossible to determineDetermine upper bounds instead High-Level Req

19、uirements for Timing AnalysisUpper bounds must be safe, i.e. not underestimatedUpper bounds should be tight, i.e. not far away from real execution timesAnalogous for lower boundsAnalysis effort must be tolerableNote: all analyzed programs are terminating, loop bounds need to be known no decidability

20、 problem, but a complexity problem!Timing Accidents and PenaltiesTiming Accident cause for an increase of the execution time of an instructionTiming Penalty the associated increaseTypes of timing accidentsCache missesPipeline stallsBranch mispredictionsBus collisionsMemory refresh of DRAMTLB missExe

21、cution Time is History-SensitiveContribution of the execution of an instruction to a programs execution time depends on the execution state, e.g. the time for a memory access depends on the cache statethe execution state depends on the execution historyneeded: an invariant about the set of execution

22、 states produced by all executions reaching a program point.We use abstract interpretation to compute these invariants.Deriving Run-Time GuaranteesOur method and tool, aiT, derives Safety Properties from these invariants : Certain timing accidents will never happen.Example: At program point p, instr

23、uction fetch will never cause a cache miss.The more accidents excluded, the lower the upper bound.MurphysinvariantFastestVariance of execution timesSlowestAbstract Interpretation in Timing AnalysisAbstract interpretation statically analyzes a program for a given property without executing it.Derived

24、 properties therefore hold for all executions.It is based on the semantics of the analyzed language.A semantics of a programming language that talks about time needs to incorporate the execution platform!Static timing analysis is thus based on such a semantics.The Architectural Abstraction inside th

25、e Timing AnalyzerTiming analyzerArchitectural abstractionsCacheAbstractionPipeline AbstractionValue Analysis, Control-FlowAnalysis,Loop-BoundAnalysisabstractions ofthe processorsarithmeticAbstract Interpretation in Timing AnalysisDeterminesinvariants about the values of variables (in registers, on t

26、he stack)to compute loop boundsto eliminate infeasible pathsto determine effective memory addressesinvariants on architectural execution stateCache contents predict hits & missesPipeline states predict or exclude pipeline stallsTool ArchitectureAbstract InterpretationsAbstract InterpretationInteger

27、LinearProgrammingValue Analysis Determines enclosingintervals for the set of values in registers and local variables, used fordetermining addresses.Loop boundanalysisDetermines loop boundsControl FlowAnalysisDetermines infeasible pathsThe Story in DetailTool ArchitectureValue AnalysisMotivation: Pro

28、vide access information to data-cache/pipeline analysisDetect infeasible pathsDerive loop boundsMethod: calculate intervals at all program points, i.e. lower and upper bounds for the set of possible values occurring in the machine program (addresses, register contents, local and global variables) (C

29、ousot/Halbwachs78)Value Analysis II Intervals are computed along the CFG edges At joins, intervals are unioned“D1: -2,+2D1: -4,0D1: -4,+2move.l #4,D0add.l D1,D0move.l (A0,D0),D1D1: -4,4, A0 x1000,0 x1000D04,4, D1: -4,4,A0 x1000,0 x1000D00,8, D1: -4,4,A0 x1000,0 x1000access 0 x1000,0 x1008Which addre

30、ss is accessed here?Value Analysis (Airbus Benchmark)1Ghz Athlon, Memory usage Cache sets are independent: Everything explained in terms of one setLRU-Replacement Strategy: Replace the block that has been Least Recently UsedModeled by AgesExample: 4-way set associative cacheage0123m0m1Access m4 (mis

31、s)m4m2m1Access m1 (hit)m0m4m2m1m5Access m5 (miss)m4m0m0 m1 m2 m3Cache AnalysisHow to statically precompute cache contents:Must Analysis:For each program point (and context), find out which blocks are in the cache prediction of cache hitsMay Analysis: For each program point (and context), find out wh

32、ich blocks may be in the cacheComplement says what is not in the cache prediction of cache missesIn the following, we consider must analysis until otherwise stated.(Must) Cache AnalysisConsider one instruction in the program.There may be many paths leading to this instruction.How can we compute whet

33、her a will always be in cache independently of which path execution takes?load a Question: Is the access to a always a cache hit?Determine Cache-Information(abstract cache states) at each Program Pointa, bxyoungest age - 0oldest age - 3Interpretation of this cache information:describes the set of al

34、l concrete cache states in which x, a, and b occur x with an age not older than 1 a and b with an age not older than 2,Cache information contains only memory blocks guaranteed to be in cache.they are associated with their maximal age.Cache Analysis how does it work?How to compute for each program po

35、int an abstract cache state representing a set of memory blocks guaranteed to be in cache each time execution reaches this program point?Can we expect to compute the largest set?Trade-off between precision and efficiency quite typical for abstract interpretation(Must) Cache analysis of a memory acce

36、ssa, bxaccess to ab, xaAfter the access to a, a is the youngest memory block in cache, and we must assume that x has aged.What about b?baaccess to ab axyyxconcretetransferfunction(cache)abstracttransferfunction(analysis)Combining Cache InformationConsider two control-flow paths to a program point:fo

37、r one, prediction says, set of memory blocks S1 in cache,for the other, the set of memory blocks S2.Cache analysis should not predict more than S1 S2 after the merge of paths.the elements in the intersection should have their maximal age from S1 and S2.Suggests the following method: Compute cache in

38、formation along all paths to a program point and calculate their intersection but too many paths!More efficient method: combine cache information on the way,iterate until least fixpoint is reached.There is a risk of losing precision, not in case of distributive transfer functions.What happens when c

39、ontrol-paths merge? a c, f d c e a d a, c d “intersection + maximal age”We canguaranteethis contenton this path.We canguaranteethis contenton this path.Which contentcan weguaranteeon this path?combine cache information at each control-flow merge pointMust-Cache and May-Cache- InformationThe presente

40、d cache analysis is a Must Analysis. It determines safe information about cache hits.Each predicted cache hit reduces the upper bound.We can also perform a May Analysis. It determines safe information about cache misses Each predicted cache miss increases the lower bound.(May) Cache analysis of a me

41、mory accessa, bxaccess to axaWhy? After the access to a a is the youngest memory block in cache, and we must assume that x, y and b have aged.b, zyzyCache Analysis: Join (may) a c, f d c e a d a,c e f d “union + minimal age”Join (may)Abstract Domain: Must Cachezsxaxsztzsxtszxtztxs AbstractionReprese

42、nting sets of concrete caches by their descriptionconcrete caches z,xsabstract cacheAbstract Domain: Must Cache z,xs Concretizationsz, x Sets of concrete caches described by an abstract cacheremaining line filled upwith any other blockconcrete cachesabstract cacheover-approximation!Abstract Domain:

43、May Cachezsxaxsztzsxtszxtztxsz ,s, x t a Abstractionabstract cacheconcrete cachesAbstract Domain: May Cache Concretizationz,s,x t a abstract may-caches saywhat definitely is not in cacheand what the minimal age of those is that may be in cache.z,s,xz,s,x,tz,s,x,tz,s,x,t,aconcrete cachesabstract cach

44、eLessons LearnedCache analysis, an important ingredient of static timing analysis, provides for abstract domains, which proved to be sufficiently precise,have compact representation,have efficient transfer functions,which are quite natural.Problem Solved?We have shown a solution for LRU caches.LRU-c

45、ache analysis works smoothlyFavorable structure“ of domainEssential information can be summarized compactlyLRU is the best strategy under several aspectsperformance, predictability, sensitivity and yet: LRU is not the only strategyPseudo-LRU (PowerPC 755 Airbus)FIFOworse under almost all aspects, bu

46、t average-case performance!Contribution to WCETwhile . . . do max n . . . ref to s . . .odtimetmissthitloop timen tmissn thittmiss (n 1) thitthit (n 1) tmissContextsCache contents depends on the Context, i.e. calls and loopswhile cond do join (must)First Iteration loads the cache =Intersection loses

47、 most of the information!Distinguish basic blocks by contextsTransform loops into tail recursive proceduresTreat loops and procedures in the same wayUse interprocedural analysis techniques, VIVU virtual inlining of proceduresvirtual unrolling of loopsDistinguish as many contexts as useful1 unrolling

48、 for caches1 unrolling for branch prediction (pipeline)Structure of the LecturesIntroductionStatic timing analysisthe problemour approachthe successtool architectureCache analysisPipeline analysisValue analysisWorst-case path analysisTiming Predictabilitycachesnon-cache-like devicesfuture architectu

49、resConclusionTool ArchitectureAbstract InterpretationsAbstract InterpretationInteger LinearProgrammingPipelinesHardware Features: PipelinesIdeal Case: 1 Instruction per CycleFetchDecodeExecuteWBFetchDecodeExecuteWBInst 1Inst 2Inst 3Inst 4FetchDecodeExecuteWBFetchDecodeExecuteWBFetchDecodeExecuteWBPi

50、pelinesInstruction execution is split into several stagesSeveral instructions can be executed in parallelSome pipelines can begin more than one instruction per cycle: VLIW, SuperscalarSome CPUs can execute instructions out-of-orderPractical Problems: Hazards and cache missesPipeline HazardsPipeline

51、Hazards:Data Hazards: Operands not yet available (Data Dependences)Resource Hazards: Consecutive instructions use same resourceControl Hazards: Conditional branchInstruction-Cache Hazards: Instruction fetch causes cache missCache analysis: prediction of cache hits on instruction or operand fetch or

52、storeStatic exclusion of hazardslwz r4, 20(r1)HitDependence analysis: elimination of data hazardsResource reservation tables: elimination of resource hazardsadd r4, r5,r6lwz r7, 10(r1)add r8, r4, r4OperandreadyIFEXMFCPU as a (Concrete) State MachineProcessor (pipeline, cache, memory, inputs) viewed

53、as a big state machine, performing transitions every clock cycleStarting in an initial state for an instruction transitions are performed, until a final state is reached:End state: instruction has left the pipeline# transitions: execution time of instructionA Concrete Pipeline Executing a Basic Bloc

54、kfunction exec (b : basic block, s : concrete pipeline state) t: traceinterprets instruction stream of b starting in state s producing trace t.Successor basic block is interpreted starting in initial state last(t)length(t) gives number of cyclesAn Abstract Pipeline Executing a Basic Blockfunction ex

55、ec (b : basic block, s : abstract pipeline state) t: traceinterprets instruction stream of b (annotated with cache information) starting in state s producing trace tlength(t) gives number of cyclesWhat is different?Abstract states may lack information, e.g. about cache contents.Traces may be longer

56、(but never shorter).Starting state for successor basic block? In particular, if there are several predecessor blocks.s2s1s?Alternatives: sets of states combine by least upper bound (join),hard to find one that preserves information and has a compact representation.Non-Locality of Local Contributions

57、Interference between processor components produces Timing Anomalies: Assuming local best case leads to higher overall execution time.Assuming local worst case leads to shorter overall execution timeEx.: Cache miss in the context of branch predictionTreating components in isolation may be unsafeImpli

58、cit assumptions are not always correct:Cache miss is not always the worst case!The empty cache is not always the worst-case start!An Abstract Pipeline Executing a Basic Block- processor with timing anomalies -function analyze (b : basic block, S : analysis state) T: set of traceAnalysis states = 2PS

59、 x CS PS = set of abstract pipeline statesCS = set of abstract cache statesinterprets instruction stream of b (annotated with cache information) starting in state S producing set of traces Tmax(length(T) - upper bound for execution timelast(T) - set of initial states for successor blockUnion for blo

60、cks with several predecessors. S2S1S3 =S1 S2Integrated Analysis: Overall PictureBasic Blocks1s10s2s3s11s12s1s13Fixed point iteration over Basic Blocks (in context) s1, s2, s3 abstract statemove.1 (A0,D0),D1Cyclewise evolution of processor modelfor instructions1 s2 s3Tool ArchitectureAbstract Interpr

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論