阿里云高性能計(jì)算-優(yōu)化工具-D_第1頁(yè)
阿里云高性能計(jì)算-優(yōu)化工具-D_第2頁(yè)
阿里云高性能計(jì)算-優(yōu)化工具-D_第3頁(yè)
阿里云高性能計(jì)算-優(yōu)化工具-D_第4頁(yè)
阿里云高性能計(jì)算-優(yōu)化工具-D_第5頁(yè)
已閱讀5頁(yè),還剩35頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、高性能計(jì)算 HPC優(yōu)化工具HPC/優(yōu)化工具HPC/優(yōu)化工具 PAGE 39 PAGE 39優(yōu)化工具kepler匯編器概述Kepler Assembler是針對(duì)nvidia Kepler架構(gòu)GPU原生匯編器??梢詫?shí)現(xiàn)從二進(jìn)制文件到原生匯編語(yǔ)言生成及其反向過(guò)程。能夠幫助用戶直接控制指令調(diào)度以及寄存器使用情況。kepler Assembler讓廣大用戶挖掘Kepler架構(gòu)系列GPU極限性能成為可能。用戶可以用過(guò)以下地址訪問(wèn)/tools/assemblerIntroductionKeplerAssemblerisanalmostfull-featuredassemblercreatedforNVidi

2、asKeplerArchitecture.Itcan beusedtogeneratebinaryfiles(.cubin)fromassemblycode,ortodumpsassfromagivencubin. Exceptforsomekernelsdescriptiveinformation,includingparameternumbersandkernelnames,the sassformatsupportedbythistoolisdefinitelythesameasthosedumpedfromcuobjdump.The function of dumping sass c

3、ode from existing cubin is useful, since it is difficult for most to write whole assembly code from scratch. Thus, dumping assembly code from a cubin compiled by nvcc, tuningthem,thengeneratingcubinwouldbeamoreconvenient,efficientandacceptablewaytouse thistool.Justlikeasfermiandmaxas,wewrotethistool

4、togetextremelyhighdeviceutility.Thetoolsprovided byNVidiadidntshowenoughsupportivetoourallkindsofneedsinprocessesofpursuingultimate performance.WithKeplerassembler,itbecomespossibletocontrolkernelsregisterusagedirectly, toadjustexecutingorderofinstructions,andeventochangetheschedulingdetailsofakerne

5、l.From Fermi to Maxwell, NVidia has brought a number of changes to every new architecture generation, for example, new instruction scheduling strategy, new ISA encoding, and so on. Therefore, such kind of assembler could not equip with enough backward compatibility. Maxas cannotbeusedonKeplerArchite

6、cture,andKeplerassemblerwontworkatFermiarchitecture,vice vasa.Input Assembly FileFormatIfyouwanttogenerateacubinfilewithyourownself-turningassembly,youshouldlearntowrite your kernel with sass as belowfirst.STS R5, R4;ST.E R8, R0; EXIT;STS R5, R4;ST.E R8, R0; EXIT;BRA 0 xe0; NOP; NOP; NOP;CTL: 000000

7、00CTL: 00000000CTL: 00000000CTL: 00000000CTL: 00101110CTL: 00000100CTL: 00101110MOV R1,c0 x00 x44; S2R R0, SR_CTAID.X; MOV32I R12, 0 x4; S2R R3,SR_TID.X;IMAD R2, R0, c0 x00 x28, R3; SHF.L.W R5, RZ, 0 x2, R2;IADDR8.CC,R5,c0 x00 x148;CTL: 00101100CTL: 00101000CTL: 00101000CTL: 00101000CTL: 00100011CTL

8、: 00000100CTL: 00000000/*0008*/*0010*/*0018*/*0020*/*0028*/*0030*/*0038*/. ./*1c88*/*1c90*/*1c98*/*1ca0*/*1ca8*/*1cb0*/*1cb8*/ codeKernel number: 1KernelName: yourKernelNamePara yourKernelName: num|4 size|32ParaDetail yourKernelName: para1|8,para2|8,para3|8,para4|8Shared yourKernelName: size|400 ali

9、gn|16Reg yourKernelName: 13code:Youshouldbeginyoursassfilewithalinetodeclarethenumberofkernelsthatyouwanttoinclude in your cubinfile.Kernel number: 10means your cubin file will have 10 CUDA kernels.You can use to start a line declaring particular kernel information.KernelName: yourKernelNameThe line

10、 above points out current kernel name is yourKernelName.AndyoushoulduselinesthatstartwithParaandParaDetailtogivetheassemblercurrentkernels parameter information, including total parameter number, total parameter size, and size of each parameter. For example, if your kernel in cuda C formatis global

11、void yourKernelName(float *yourPara1, float yourPara2, int* yourPara3, bool yourPara4) Then you should writeKernelName: yourKernelNameKernelName: yourKernelNamePara yourKernelName: num|4 size|32ParaDetail yourKernelName: para1|8,para2|4,para3|8,para4|1The number after |, indicates value of correspon

12、ding variable. For example, num|4 means current kernelhas4parameter,size|32meansitstotalparametersizeis32byte.InthelineofParaDetail , para1|8meansthesizeoffirstparameteris8byte,para2|4meansthesizeofsecondparameteris4 byte, and so forth. Note that, in this line, the string before |, is fixed to be am

13、ong para1, para2 paraN.The lines with Shared illustrate kernels shared memory usage. All shared memories that a kernel useshouldbetakenintoaccount,includingstaticsharedmemoryanddynamicsharedmemory(at compiletime,inmostcases,dynamicsharedmemoryiszero).Onethingshouldbementionedisthat shared memory siz

14、e should be set with the consideration of shared memory alignment, different types should obey different alignmentrules.ThelineswithReg willtellKeplerassemblerhowmanyregistersthekernelwilluseperthread. Everylinebetweenismadeupofthreeparts.Onlefthandsideofeachline,code for sm_35Function : _Z5test1PfS

15、_S_S_i.headerflags EF_CUDA_SM35 EF_CUDA_PTX_SM(EF_CUDA_SM35)/* 0 x08ac80a0a08c1000 */*0008*/MOVR1,c0 x00 x44;/*0 x64c03c00089c0006*/*0010*/S2RR0,SR_CTAID.X;/*0 x86400000129c0002*/*0018*/MOV32IR4,0 x4;/*0 x74000000021fc012*/*0020*/S2Rcode for sm_35Function : _Z5test1PfS_S_S_i.headerflags EF_CUDA_SM35

16、 EF_CUDA_PTX_SM(EF_CUDA_SM35)/* 0 x08ac80a0a08c1000 */*0008*/MOVR1,c0 x00 x44;/*0 x64c03c00089c0006*/*0010*/S2RR0,SR_CTAID.X;/*0 x86400000129c0002*/*0018*/MOV32IR4,0 x4;/*0 x74000000021fc012*/*0020*/S2RR3,SR_TID.X;/*0 x86400000109c000e*/*0028*/IMAD R5, R0,c0 x00 x28,R3;/* 0 x51080c00051c0016*/*0030*

17、/IMAD R2.CC, R5,R4,c0 x00 x150;/* 0 x910c10002a1c140a*/*0038*/MOVR0,c0 x00 x160;/* 0 x64c03c002c1c0002*/* 0 x088c10a010b010a0 */*0048*/IMAD.HI.X R3, R5,R4,c0 x00 x154;/*0 x931810002a9c140e*/*0050*/IMAD R6.CC, R5,R4,c0 x00 x158;/*0 x910c10002b1c141a*/Asshownabove,0 x08ac80a0a08c1000and0 x088c10a010b0

18、10a0arecontrolcodes.Eachofthemcan influence scheduling behavior of 7 instructions following them. In Kepler assembler, weve divided every control line into 7 parts, put them ahead of every single instruction line. In kepler assembler, CTLxxxxxxxxpartiswhatwearetalkingabout.Itcoulddeterminehowtwoinst

19、ructions(theoneright after it, and the one in next line) will be issued. At this point, we have done some experiments, and tried to understand how control code works, but there are still a lot of unsolved questions. We are gladtodiscussandcommunicatewithanyonewhoisinterestedinit.Thethirdpartofacodel

20、ineisinstruction,whichwillbeissuedandexecutedonprocessoratrun time.Thedetailsaboutinstructionformatwillbedisplayedinsection5.Use KeplerAssemblerNVidiaprovidestheCUDAmoduleAPI.ThisAPIenablesustoloadCUDAbinaryandlaunchpre- compiledkernelsathost,giveusapracticalwaytouseourownhand-tuningkernels.Here,weg

21、iveacompletedexampletoshowhowyoucanusethisassemblertotuneyourkernel.This examplewillstartfromaCUDAkernelwrittenbyCUDAClanguage.Then,thiskernelwillbecomplied by nvcc to generate a cubin. Kepler assembler helps to convert this cubin file into a assembly code file, in which you can do some changes. Whe

22、n all optimization processes complete, assembly code filecouldbeinputtoKeplerassemblertore-generateacubin.Afteralloftheseabove,thiscubinwill be used in a sample C program through CUDAmodule. global void testFunction(float* matrix1, float* matrix2, float* output_array1, float* output_array2, int n) i

23、nt idx = blockIdx.x*blockDim.x+threadIdx.x;float dataFromM1, dataFromM2; output_array1idx = 0.0;output_array2idx = 0.0; for(int i = 0; i n; i+)dataFromM1 =matrix1idx+i*n; dataFromM2 =matrix2idx+i*n; output_array1idx+=dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFr

24、omM1*data FromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1;output_array2idx += dataFromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2*data FromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2;This is a quite simple kernel: two matrices of n by n

25、 are input, every element of each matrix is multiplied by itself 12 times, results are summed among a whole column and restored to an array. Well,thiskernelappearsalittlekooky.Thereasonweuseitistogiveanexamplewhoseperformance islimitedbycomputing.Wesuggestthisassemblerwontbeuseduntilyouhavedoneallof

26、regular optimizations,sincewritingandeditingassembly global void testFunction(float* matrix1, float* matrix2, float* output_array1, float* output_array2, int n) int idx = blockIdx.x*blockDim.x+threadIdx.x;float dataFromM1, dataFromM2; output_array1idx = 0.0;output_array2idx = 0.0; for(int i = 0; i n

27、; i+)dataFromM1 =matrix1idx+i*n; dataFromM2 =matrix2idx+i*n; output_array1idx+=dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*data FromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1*dataFromM1;output_array2idx += dataFromM2*dataFromM2*dataFromM2*dataFromM2*da

28、taFromM2*dataFromM2*dataFromM2*dataFromM2*data FromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2*dataFromM2;Savethisfunctionintest.cu,andcompileittogeneratecubinfilewithsm_35architecture. nvcc -gencode arch=compute_35,code=sm_35 -cubintest.cuKernel number: 1#Current Kernel Name. Note: kernel name s

29、hould be make up from letters, digits and the underscore character.KernelName: _Z12testFunctionPfS_S_S_i#Parameter Number and Parameter Size.Para Z12testFunctionPfS_S_S_i: num|5 size|36Nowwehavetest.cubin,whichisanelffile.Uploadthiscubintoourwebsite,theassemblycode correspondingtoitwillbedumpedasfol

30、lows.Youcanuse#tobeginKernel number: 1#Current Kernel Name. Note: kernel name should be make up from letters, digits and the underscore character.KernelName: _Z12testFunctionPfS_S_S_i#Parameter Number and Parameter Size.Para Z12testFunctionPfS_S_S_i: num|5 size|36#Sizeofeveryparameterparticularly.No

31、te:para1representsthefirstparameter,para2representsthesecond, and soforth.ParaDetail Z12testFunctionPfS_S_S_i: para1|8,para2|8,para3|8,para4|8,para5|4#Register Number per Thread.Reg Z12testFunctionPfS_S_S_i: 16Nowyoucanplaywiththiskernelthroughassemblycode.Trytochangesomecontrolcodes,or moveoneinstr

32、uctiontoanotherline.Herewewilljustchangesomecontrolcodeinafewlines.FMUL R12, R12, R9; FMUL R11, R11,FMUL R12, R12, R9; FMUL R11, R11,R10; FMUL R12, R12, R9; FMUL R11, R11,R10; FMUL R13, R12, R9; FMUL R11, R11,R10; LD.E R12,R2;/*0148*/CTL:00100111/*0150*/CTL:00100000/*0158*/CTL:00100111/*0160*/CTL:00

33、100000/*0168*/CTL:00100111/*0170*/CTL:00100000/*0178*/CTL:00100000withFMUL R12, R12, R9; FMUL R11, R11,FMUL R12, R12, R9; FMUL R11, R11,R10; FMUL R12, R12, R9; FMUL R11, R11,R10; FMUL R13, R12, R9; FMUL R11, R11,R10; LD.E R12,R2;/*0148*/ CTL: 00000101/*0150*/ CTL: 00000100/*0158*/ CTL: 00000101/*016

34、0*/ CTL: 00000100/*0168*/ CTL: 00000101/*0170*/ CTL: 00000100/*0178*/ CTL: 00100000Whenyourassemblyfileisready,youcanuseKeplerassemblertogeneratethecorrespondingcubin, downloadthiscubinandcongratulate!Nowyouhaveaself-optimizedcubinfile.Innextsection,we will show you how to use this cubin in your own

35、project.Use CUBIN in YourProgramWeusehost-endCUDAmoduleAPItoloadcubinandlaunchkernelsinacubin.Hereisanexample ofusingthecubinfilegeneratedinprevioussection.Beforeyoucouldlaunchaparticularkernelin視頻點(diǎn)播/SDK手冊(cè)視頻點(diǎn)播/SDK手冊(cè)CUdevice cuDevice;CUcontext cuContext;CUdevice cuDevice;CUcontext cuContext;CUmodule

36、cuModule;CUresult error; cuInit(0);int devID = 0;/ get deviceerror = cuDeviceGet(&cuDevice, devID); if (error != CUDA_SUCCESS)std:cout cuDeviceGet error! Error code: errorstd:endl;/ create contexterror = cuCtxCreate(&cuContext, 0, cuDevice); if (error != CUDA_SUCCESS)std:cout cuCtxCreate error! Erro

37、r: code: errorstd:endl;/ load moduleerror = cuModuleLoad(&cuModule, test.cubin); if (error != CUDA_SUCCESS)std:cout cuModuleLoad error! Error code: errorstd:endl;/ get function CUfunction test1;error=cuModuleGetFunction(&test1,cuModule,kernelNameinCubin); if (error !=CUDA_SUCCESS)std:cout cuModuleGe

38、tFunction error! Error code: errorstd:endl;Onething,thatweshouldmentionhere,isthekernelnameinputtedtocuModuleGetFunctionshould be the name of kernel in cubin file or assembly, not the one in CUDA C program. Nvcc will change your kernel name when it compiles aprogram.Afterinputparametersaredeclaredan

39、dinitialized,youshoulduseanarrayofpointerstowrapup theiraddresses.error = cuLaunchKernel(test1, blockNum.x, blockNum.y, blockNum.z, threadNum.x, threadNum.y, threadNum.z, shareMemSize,NULL, args, NULL);if (error != CUDA_SUCCESS)std:coutcuLaunchKernel error! Error code: errorstd:endl;void*args=&matri

40、x1,&matrix2,&outputArray1,&outputArray2,error = cuLaunchKernel(test1, blockNum.x, blockNum.y, blockNum.z, threadNum.x, threadNum.y, threadNum.z, shareMemSize,NULL, args, NULL);if (error != CUDA_SUCCESS)std:coutcuLaunchKernel error! Error code: errorstd:endl;In this way, you can create your project a

41、nd run your hand writing cuda kernels on Kepler architecture.Frankly,tuningkernelthroughnativecodeistypicallyachallengingjobandneedssome experiments to unveil all kinds of details about Kepler device. But if you want to make your kernel moreefficient,andyouvetriedeverymethodtooptimizeyourcode.Yousho

42、uldtakethisassembler andgiveitashoot.Afteryoureducesomeregisterusageorrelocatesomeinstructionsinyourkernel tomakethemissuedwithlessstall.Youmaygetsignificantperformanceimprovement.The only purpose of this example above is to walk you through the whole process of using kepler assembler.Wedidnottrytod

43、oanydeepoptimization.Evenso,aftersimplychangingthosecontrol codesmentionedabove,wecouldstillacceleratethiskernelbyabout0.5us.Kepler InstructionSetoperandIn kepler native assembler, operand mainly has three types:register:registernumberrangesfrom0to255(useRZinsteadofR255);immediatenumber:forfloatnumb

44、er,usingdecimalform;forintegernumber,using hexadecimalform;constant memoory: constand memory has the form of cimm1imm2, where imm1 and imm2arebothhexadecimalnumber,imm1referstobanknumberofconstantmemory,and imm2 refers to the offset to thatbank;Fromnow,wewillusecomposite_operandinthefiledwhereallthr

45、eetypesareallOK.Forexample, the usage of FADDis:FADD rd, r1, composite_opreand;FADD rd, r1, r2;FADD rd, r1, c;FADD rd, r1, r2;FADD rd, r1, c;FADD rd, r1, imm;therearealsootherkindsofregister,forexample:pridectiedregister(rangedfrom0to7(usingPT instead ofP7)1.FFMADescription:FP32 Fused Multiply Add,

46、has two instructions:FFMA32I and FFMA:FFMA32I: the immediate number is ieee-754 compatibled, which means it has 23-bit mantissa in the表格存儲(chǔ)/SDK參考手冊(cè)表格存儲(chǔ)/SDK參考手冊(cè)ieee-754form;FFMA:theimmediatenumberisieee-754partialcompatibled,itonlyhas11-bit mantissa in the ieee-754form;FFMA32I(.SAT)(.FTZ)(.FMZ) rd(.CC

47、), (-)r1, imm, (-)r3;FFMA(.rnd)(.FMZ)(.FTZ)(.SAT) rd(.CC), (-)r1, composite_operand, (-)r3;FFMA32I(.SAT)(.FTZ)(.FMZ) rd(.CC), (-)r1, imm, (-)r3;FFMA(.rnd)(.FMZ)(.FTZ)(.SAT) rd(.CC), (-)r1, composite_operand, (-)r3;FFMA(.rnd)(.FMZ)(.FTZ)(.SAT) rd(.CC), (-)r1, r2, (-)c;Others:In FFMA32I instruction, r

48、3 must be the same with rd.rndreferstotheroundmode,itcanbeoneofRD,RZ,RP,andRM;Ifnotspecify,defaultround mode isRN;.CC refers to set the carry bit2.FADDDescription:FADD32I(.FTZ) rd(.CC), (-)(|)r1(|), imm(.NEG)FADD(.rnd)(.SAT)(.FTZ) rd(.CC), (-)(|)r1(|), (-)(|)composite_operand(.ABS)(.NEG)(|);FP32Add,

49、alsohastwoinstructions:FADD32IandFADD32I(.FTZ) rd(.CC), (-)(|)r1(|), imm(.NEG)FADD(.rnd)(.SAT)(.FTZ) rd(.CC), (-)(|)r1(|), (-)(|)composite_operand(.ABS)(.NEG)(|);Others:whencomposite_operandrefertoimmediatenumber,use(.ABS)and(.NEG)flagsinsteadof(-)and (|)3.FCMPFCMP(.compare_op)(.FTZ) rd, r1, composi

50、te_operand, r3; FCMP(.compare_op)(.FTZ) rd, r1, r2, c;Description: FP32 compare Usage:FCMP(.compare_op)(.FTZ) rd, r1, composite_operand, r3; FCMP(.compare_op)(.FTZ) rd, r1, r2, c;Otherscompare_op refer to one of GT, NE, GE, NUM, NAN, LTU, EQU, and LEU密鑰管理服務(wù)/SDK密鑰管理服務(wù)/SDK4.FMULDescription:FMUL32I(.FM

51、Z)(.FTZ)(.SAT) rd(.CC), imm;FMUL(.rnd)(.SAT)(.FMZ)(.FTZ)(.M8)(.D4)(.D2) rd(.CC), (-)r1, composite_operand;FP32multiply,hastwoinstructions:FMUL32IandFMUL32I(.FMZ)(.FTZ)(.SAT) rd(.CC), imm;FMUL(.rnd)(.SAT)(.FMZ)(.FTZ)(.M8)(.D4)(.D2) rd(.CC), (-)r1, composite_operand;Others:5.FMNMXDescription:FP32 mini

52、mum/maximum Usage:FMNMX(.FTZ) rd(.CC), (-)(|)r1(|), (-)(|)composite_operand(|)(.ABS)(.NEG), (!)PT; Others:PT refers to MAX while !PT refers to MIN6.FSWZDescription: FP32swizzle Usage:FSWZ(.rnd)(.FTZ)(.mode)(.NDV) rd(.CC), r1, r2, PPPPPPPP;Others:.mode refers to one of .0000, .1111, .2222, .3333, .10

53、32, .23017.FSETDescription: FP32 set Usage:FSET(.FTZ)(.BF).compare_op.logic_op rd(.CC), (-)(|)r1(|), (-)(|)composite_operand(|)(.ABS)(.NEG); Others:logic_op refer to one of AND, OR, and XORcompare_op refer to one of F, LT, EQ, LE, GT, NE, GE, NUM, NAN, LTU, EQU, LEU, GTU,NEU, GEU, T8.FSETPDescriptio

54、n:FP32 set predicate Usage:FSETP(.FTZ)(.compare_op)(.logic_op) p1, p2, (-)(|)r1(|), (-)(|)composite_operand(|)(.ABS)(.NEG); Others:compare_op and logic_op are the same with FSET9.FCHKDescription:FP32 division test Usage:FCHK.divide p1, (-)(|)r1(|), (-)(|)composite_operand(|)(.ABS)(.NEG); Others:10.R

55、RODescription:FP range reduction operator Usage:RRO.SINCOS(.EX2) rd, (-)(|)composite_operand(|)(.ABS)(.NEG); Others:11.MUFUDescription:歸檔存儲(chǔ)/SDK使用手冊(cè)歸檔存儲(chǔ)/SDK使用手冊(cè)FP multi-function opreator Usage:MUFU.function_type(.SAT) rd, (-)(|)r1(|); Others:function_type refers to one of COS, SIN, EX2, LG2, RCP, RSQ

56、, RCP64, RSQ6412.DFMADescription:DFMA(.rnd) rd(.CC), (-)r1, composite_operand, (-)r3;DFMA(.rnd) rd(.CC), (-)rd, r2, (-)c;DFMA(.rnd) rd(.CC), (-)r1, composite_operand, (-)r3;DFMA(.rnd) rd(.CC), (-)rd, r2, (-)c;Others:13.DADDDescription: FP64 add Usage:DADD(.rnd) rd(.CC), (-)(|)r1(|), (-)(|)composite_

57、operand(|)(.ABS)(.NEG); Others:14.DMULDescription: FP64 multiply Usage:DMUL(.rnd)rd(.CC)(-)r1,composite_operand; Others:15.DMNMXDescription:FP64 minimum/maximum Usage:DMNMX rd(.CC), (-)(|)r1(|), (-)(|)composite_operand(|)(.ABS)(.NEG), p1; Others:16.DSETDescription: FP64 set Usage:DSET.compare_op.log

58、ic_op(.BF) rd(.CC), (-)(|)r1(|), (-)(|)composite_oprand(|)(.ABS)(.NEG), (!)p1; Others:compare_op and logic_op are the same with FSET17.DSETPDescription:FP64 set predicate Usage:DSETP.compare_op.logic_op p1, p2, (-)(|)r1(|), (-)(|)composite_operand(|)(.ABS)(NEG), (!)p3; Others:18.IMADDescription:IMAD

59、32I(.HI)(.d_type)(.s_type)(.P0) rd(.CC), (-)r1, imm, (-)r3;IMAD(.HI)(.d_type)(.s_type)(.PO)(.X)(.SAT) rd(.CC), (-)r1, composite_operand, (-)r3;IMAD(.HI)(.d_type)(.s_type)(.PO)(.X)(.SAT) rd(.CC), (-)r1, r2, (-)c;Integermultiplyadded;hastwoinstructions:IMAD32IandIMAD32I(.HI)(.d_type)(.s_type)(.P0) rd(

60、.CC), (-)r1, imm, (-)r3;IMAD(.HI)(.d_type)(.s_type)(.PO)(.X)(.SAT) rd(.CC), (-)r1, composite_operand, (-)r3;IMAD(.HI)(.d_type)(.s_type)(.PO)(.X)(.SAT) rd(.CC), (-)r1, r2, (-)c;Others:d_type is one of U32 and S32;s_type is one of U32 and S32;if both are not specified, both are default S32;if one is U

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論