版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
PythonForDataScienceCheatSheet
PythonBasics
LearnMorePythonforDataScienceInteractivelyat
VariablesandDataTypes
NumpyArrays
AlsoseeLists
SelectingNumpyArrayElements
Indexstartsat0
NumpyArrayOperations
NumpyArrayFunctions
DataCamp
LearnPythonforDataScienceInteractively
my_2darray[rows,columns]
Selectitemsatindex0and1
Selectitematindex1
Subset
>>>my_array[1]
2
Slice
>>>my_array[0:2]
array([1,2])
Subset2DNumpyarrays
>>>my_2darray[:,0]
array([1,4])
>>>my_array>3
array([False,False,False,True],dtype=bool)
>>>my_array*2
array([2,4,6,8])
>>>my_array+np.array([5,6,7,8])
array([6,8,10,12])
>>>my_list=[1,2,3,4]
>>>my_array=np.array(my_list)
>>>my_2darray=np.array([[1,2,3],[4,5,6]])
VariableAssignment
Lists
Selectitematindex1Select3rdlastitem
Selectitemsatindex1and2Selectitemsafterindex0Selectitemsbeforeindex3Copymy_list
my_list[list][itemOfList]
Subset
>>>my_list[1]
>>>my_list[-3]
Slice
>>>my_list[1:3]
>>>my_list[1:]
>>>my_list[:3]
>>>my_list[:]
SubsetListsofLists
>>>my_list2[1][0]
>>>my_list2[1][:2]
>>>a='is'
>>>b='nice'
>>>my_list=['my','list',a,b]
>>>my_list2=[[4,5,6,7],[3,4,5,6]]
SelectingListElements
AlsoseeNumPyArrays
Libraries
Importlibraries
>>>importnumpy
>>>importnumpyasnp
Selectiveimport
>>>frommathimportpi
InstallPython
Dataanalysis
Machinelearning
Scientificcomputing
2Dplotting
Leadingopendatascienceplatform FreeIDEthatisincluded CreateandsharepoweredbyPython withAnaconda documentswithlivecode,
visualizations,text,...
Indexstartsat0
>>>x=5
>>>x5
CalculationsWithVariables
>>>x+2
Sumoftwovariables
7
>>>x-2
Subtractionoftwovariables
3
>>>x*2
Multiplicationoftwovariables
10
>>>x**2
Exponentiationofavariable
25
>>>x%2
Remainderofavariable
1
>>>x/float(2)
Divisionofavariable
2.5
GettheindexofanitemCountanitem
AppendanitematatimeRemoveanitem
RemoveanitemReversethelistAppendanitemRemoveanitemInsertanitemSortthelist
>>>my_list.index(a)
>>>my_list.count(a)
>>>my_list.append('!')
>>>my_list.remove('!')
>>>del(my_list[0:1])
>>>my_list.reverse()
>>>my_list.extend('!')
>>>my_list.pop(-1)
>>>my_list.insert(0,'!')
>>>my_list.sort()
str()
'5','3.45','True'
Variablestostrings
int()
5,3,1
Variablestointegers
float()
5.0,1.0
Variablestofloats
bool()
True,True,True
Variablestobooleans
TypesandTypeConversion
ListOperations
>>>my_list+my_list
['my','list','is','nice','my','list','is','nice']
>>>my_list*2
['my','list','is','nice','my','list','is','nice']
>>>my_list2>4
True
ListMethods
AskingForHelp
>>>help(str)
>>>my_string='thisStringIsAwesome'
>>>my_string
'thisStringIsAwesome'
Strings
>>>my_string*2
'thisStringIsAwesomethisStringIsAwesome'
>>>my_string+'Innit'
'thisStringIsAwesomeInnit'
>>>'m'inmy_string
True
StringOperations
StringOperations
>>>my_string[3]
>>>my_string[4:9]
StringMethods
Indexstartsat0
>>>my_array.shape
>>>np.append(other_array)
GetthedimensionsofthearrayAppenditemstoanarray
>>>np.insert(my_array,1,5)
Insertitemsinanarray
>>>np.delete(my_array,[1])
Deleteitemsinanarray
>>>np.mean(my_array)
Meanofthearray
>>>np.median(my_array)
Medianofthearray
>>>my_array.corrcoef()
Correlationcoefficient
>>>np.std(my_array)
Standarddeviation
StringtouppercaseStringtolowercaseCountStringelementsReplaceStringelementsStripwhitespaces
>>>my_string.upper()
>>>my_string.lower()
>>>my_string.count('w')
>>>my_string.replace('e','i')
>>>my_string.strip()
PythonForDataScienceCheatSheet
JupyterNotebook
LearnMorePythonforDataScienceInteractivelyat
www.DataC
WorkingwithDifferentProgrammingLanguages
Kernelsprovidecomputationandcommunicationwithfront-endinterfaceslikethenotebooks.Therearethreemainkernels:
IRkernel IJulia
InstallingJupyterNotebookwillautomaticallyinstalltheIPythonkernel.
Widgets
Notebookwidgetsprovidetheabilitytovisualizeandcontrolchangesinyourdata,oftenasacontrollikeaslider,textbox,etc.
YoucanusethemtobuildinteractiveGUIsforyournotebooksortosynchronizestatefulandstatelessinformationbetweenPythonandJavaScript.
Saving/LoadingNotebooks
Createnewnotebook
Makeacopyofthecurrentnotebook
Savecurrentnotebookandrecordcheckpoint
Previewoftheprintednotebook
Closenotebook&stoprunninganyscripts
Openanexistingnotebook
Renamenotebook
Revertnotebooktoapreviouscheckpoint
Downloadnotebookas
IPythonnotebook
Python
HTML
Markdown
reST
Restartkernel
Restartkernel&runallcells
Restartkernel&runallcells
CommandMode:
Interruptkernel
Interruptkernel&clearalloutput
Connectbacktoaremotenotebook
Runotherinstalledkernels
Downloadserializedstateofallwidgetmodelsinuse
Savenotebookwithinteractivewidgets
Embedcurrentwidgets
15
13 14
WritingCodeAndText
LaTeX
1 2 3 4 5 67 8910 11 12
Codeandtextareencapsulatedby3basiccelltypes:markdowncells,codecells,andrawNBConvertcells.
EditCells
EditMode:
Saveandcheckpoint
Insertcellbelow
Interruptkernel
Restartkernel
Cutcurrentlyselectedcellstoclipboard
Pastecellsfromclipboardabovecurrentcell
Pastecellsfrom
Copycellsfromclipboardtocurrentcursorposition
Pastecellsfromclipboardbelowcurrentcell
ExecutingCells
Runselectedcell(s) Runcurrentcellsdownandcreateanewone
below
Cutcell
Copycell(s)
Pastecell(s)below
Movecellup
Movecelldown
Runcurrentcell
AskingForHelp
Displaycharacteristics
Opencommandpalette
Currentkernel
Kernelstatus
Logoutfromnotebookserver
clipboardontopofcurrentcel
Revert“DeleteCells”
invocation
Mergecurrentcellwiththeoneabove
Movecurrentcellup
Adjustmetadataunderlyingthecurrentnotebook
Removecellattachments
Pasteattachmentsofcurrentcell
InsertCells
Addnewcellabovethecurrentone
Deletecurrentcells
Splitupacellfromcurrentcursorposition
Mergecurrentcellwiththeonebelow
Movecurrentcelldown
Findandreplaceinselectedcells
Copyattachmentsofcurrentcell
Insertimageinselectedcells
Addnewcellbelowthecurrentone
Runcurrentcellsdownandcreateanewoneabove
Runallcellsabovethecurrentcell
Changethecelltypeofcurrentcell
toggle,togglescrollingandclearalloutput
ViewCells
ToggledisplayofJupyterlogoandfilename
Togglelinenumbersincells
Runallcells
Runallcellsbelowthecurrentcell
toggle,togglescrollingandclearcurrentoutputs
Toggledisplayoftoolbar
Toggledisplayofcellactionicons:
None
Editmetadata
Rawcellformat
Slideshow
Attachments
Tags
WalkthroughaUItour
Editthebuilt-inkeyboardshortcuts
Descriptionofmarkdownavailableinnotebook
PythonhelptopicsNumPyhelptopicsMatplotlibhelptopics
Pandashelptopics
DataCamp
Listofbuilt-inkeyboardshortcuts
Notebookhelptopics
InformationonunofficialJupyterNotebookextensions
IPythonhelptopicsSciPyhelptopicsSymPyhelptopics
AboutJupyterNotebook
LearnPythonforDataScienceInteractively
NumPy
TheNumPylibraryisthecorelibraryforscientificcomputinginPython.Itprovidesahigh-performancemultidimensionalarrayobject,andtoolsforworkingwiththesearrays.
Usethefollowingimportconvention:
>>>importnumpyasnp
NumPyArrays
2
1Darray
2Darray
axis1
axis0
3Darray
axis2
axis1
axis0
CreatingArrays
InitialPlaceholders
I/O
Saving&LoadingOnDisk
Saving&LoadingTextFiles
DataTypes
CreateanarrayofzerosCreateanarrayofonesCreateanarrayofevenlyspacedvalues(stepvalue)
Createanarrayofevenly
spacedvalues(numberofsamples)
CreateaconstantarrayCreatea2X2identitymatrix
CreateanarraywithrandomvaluesCreateanemptyarray
>>>np.zeros((3,4))
>>>np.ones((2,3,4),dtype=16)
>>>d=np.arange(10,25,5)
>>>np.linspace(0,2,9)
>>>e=np.full((2,2),7)
>>>f=np.eye(2)
>>>np.random.random((2,2))
>>>np.empty((3,2))
>>>np.loadtxt("myfile.txt")
>>>np.genfromtxt("my_file.csv",delimiter=',')
>>>np.savetxt("myarray.txt",a,delimiter="")
>>>np.save('my_array',a)
>>>np.savez('array.npz',a,b)
>>>np.load('my_array.npy')
>>>a=np.array([1,2,3])
>>>b=np.array([(1.5,2,3),(4,5,6)],dtype=float)
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]],
dtype=float)
1 2
3
Subsetting,Slicing,Indexing
AlsoseeLists
Subsetting
>>>a[2]
3
>>>b[1,2]
6.0
Slicing
>>>a[0:2]
array([1,2])
>>>b[0:2,1]
array([2.,5.])
>>>b[:1]
array([[1.5,2.,3.]])
>>>c[1,...]
array([[[3.,2.,1.],
[4.,5.,6.]]])
>>>a[::-1]
array([3,2,1])
BooleanIndexing
>>>a[a<2]
array([1])
12
3
1.52
4
5
3
6
Selecttheelementatthe2ndindex
Selecttheelementatrow0column2(equivalenttob[1][2])
123 Selectitemsatindex0and1
Selectitemsatrows0and1incolumn1
Selectallitemsatrow0(equivalenttob[0:1,:])
Sameas[1,:,:]
Reversedarraya
123 Selectelementsfromalessthan2
FancyIndexing
>>>b[[1,0,1,0],[0,1,2,0]] Selectelements(1,0),(0,1),(1,2)and(0,0)
array([4.,2.,6.,1.5])
>>>b[[1,0,1,0]][:,[0,1,2,0]] Selectasubsetofthematrix’srows
array([[4.,5.,6.,4.], andcolumns
[1.5,2.,3.,1.5],
[4.,5.,6.,4.],
[1.5,2.,3.,1.5]])
ArrayManipulation
CopyingArrays
SortingArrays
Sortanarray
Sorttheelementsofanarray'saxis
>>>a.sort()
>>>c.sort(axis=0)
InspectingYourArray
>>>a.shape
Arraydimensions
>>>len(a)
Lengthofarray
>>>b.ndim
Numberofarraydimensions
>>>e.size
Numberofarrayelements
>>>b.dtype
Datatypeofarrayelements
>>>
Nameofdatatype
>>>b.astype(int)
Convertanarraytoadifferenttype
>>>a.sum()
Array-wisesum
>>>a.min()
Array-wiseminimumvalue
>>>b.max(axis=0)
Maximumvalueofanarrayrow
>>>b.cumsum(axis=1)
Cumulativesumoftheelements
>>>a.mean()
Mean
>>>b.median()
Median
>>>a.corrcoef()
Correlationcoefficient
>>>np.std(b)
Standarddeviation
TransposingArray
>>>i=np.transpose(b)
>>>i.T
PermutearraydimensionsPermutearraydimensions
ChangingArrayShape
>>>b.ravel()
>>>g.reshape(3,-2)
Flattenthearray
Reshape,butdon’tchangedata
Adding/RemovingElements
>>>h.resize((2,6))
Returnanewarraywithshape(2,6)
>>>np.append(h,g)
Appenditemstoanarray
>>>np.insert(a,1,5)
Insertitemsinanarray
>>>np.delete(a,[1])
Deleteitemsfromanarray
CombiningArrays
>>>np.concatenate((a,d),axis=0)
Concatenatearrays
array([1,2,3,10,15,20])
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
array([[1.,2.,3.],
[1.5,2.,3.],
[4.,5.,6.]])
>>>np.r_[e,f]
Stackarraysvertically(row-wise)
>>>np.hstack((e,f))
array([[7.,7.,1.,0.],
Stackarrayshorizontally(column-wise)
[7.,7.,0.,1.]])
>>>np.column_stack((a,d))
Createstackedcolumn-wisearrays
array([[1,10],
[2,15],
[3,20]])
>>>np.c_[a,d]
Createstackedcolumn-wisearrays
SplittingArrays
>>>np.hsplit(a,3)
[array([1]),array([2]),array([3])]
>>>np.vsplit(c,2)
[array([[[1.5,2.,1.],
[4.,5.,6.]]]),
array([[[3.,2.,3.],
[4.,5.,6.]]])]
Splitthearrayhorizontallyatthe3rdindex
Splitthearrayverticallyatthe2ndindex
DataCamp
LearnPythonforDataScienceInteractively
>>>h=a.view()
>>>np.copy(a)
>>>h=a.copy()
CreateaviewofthearraywiththesamedataCreateacopyofthearray
Createadeepcopyofthearray
>>>64
Signed64-bitintegertypes
>>>np.float32
Standarddouble-precisionfloatingpoint
>>>plex
Complexnumbersrepresentedby128floats
>>>np.bool
BooleantypestoringTRUEandFALSEvalues
>>>np.object
Pythonobjecttype
>>>np.string_
Fixed-lengthstringtype
>>>np.unicode_
Fixed-lengthunicodetype
PythonForDataScienceCheatSheet
NumPyBasics
ArrayMathematics
ArithmeticOperations
Comparison
AggregateFunctions
Element-wisecomparison
Element-wisecomparisonArray-wisecomparison
>>>a==b
array([[False,True,True],
[False,False,False]],dtype=bool)
>>>a<2
array([True,False,False],dtype=bool)
>>>np.array_equal(a,b)
Subtraction
SubtractionAddition
AdditionDivision
DivisionMultiplication
MultiplicationExponentiationSquareroot
PrintsinesofanarrayElement-wisecosine
Element-wisenaturallogarithmDotproduct
>>>np.divide(a,b)
>>>a*b
array([[1.5, 4., 9.],
[4.,10.,18.]])
>>>np.multiply(a,b)
>>>np.exp(b)
>>>np.sqrt(b)
>>>np.sin(a)
>>>np.cos(b)
>>>np.log(a)
>>>e.dot(f)
array([[7.,7.],
[7.,7.]])
],
]])
>>>g=a-b
array([[-0.5,0.,0.],
[-3.,-3.,-3.]])
>>>np.subtract(a,b)
>>>b+a
array([[2.5,4.,6.],
[5.,7.,9.]])
>>>np.add(b,a)
>>>a/b
array([[0.66666667,1. ,1.
[0.25 ,0.4 ,0.5
LearnPythonforDataScienceInteractivelyat
www.DataC
AskingForHelp
>>>(np.ndarray.dtype)
1.5
4
2
3
5
6
1.523
4
5
6
1.5
2
3
4
5
6
PythonForDataScienceCheatSheet
SciPy-LinearAlgebra
LearnMorePythonforDataScienceInteractivelyat
LinearAlgebra
You’llusethelinalgandsparsemodules.Notethatscipy.linalgcontainsandexpandsonnumpy.linalg.
>>>fromscipyimportlinalg,sparse
MatrixFunctions
SciPy
TheSciPylibraryisoneofthecorepackagesforscientificcomputingthatprovidesmathematicalalgorithmsandconveniencefunctionsbuiltontheNumPyextensionofPython.
Addition
>>>np.add(A,D)
Subtraction
>>>np.subtract(A,D)
Division
>>>np.divide(A,D)
Multiplication
>>>np.multiply(D,A)
>>>np.dot(A,D)
>>>np.vdot(A,D)
>>>np.inner(A,D)
>>>np.outer(A,D)
>>>np.tensordot(A,D)
>>>np.kron(A,D)
ExponentialFunctions
>>>linalg.expm(A)
>>>linalg.expm2(A)
>>>linalg.expm3(D)
AdditionSubtractionDivision
MultiplicationDotproduct
Vectordotproduct
InnerproductOuterproductTensordotproductKroneckerproduct
Matrixexponential
Matrixexponential(TaylorSeries)
Matrixexponential(eigenvalue
decomposition)
LogarithmFunction
>>>linalg.logm(A)
TrigonometricTunctions
>>>linalg.sinm(D)
>>>linalg.cosm(D)
>>>linalg.tanm(A)
HyperbolicTrigonometricFunctions
>>>linalg.sinhm(D)
>>>linalg.coshm(D)
>>>linalg.tanhm(A)
MatrixSignFunction
>>>np.sigm(A)
MatrixSquareRoot
>>>linalg.sqrtm(A)
ArbitraryFunctions
>>>linalg.funm(A,lambdax:x*x)
Matrixlogarithm
MatrixsineMatrixcosineMatrixtangent
HypberbolicmatrixsineHyperbolicmatrixcosineHyperbolicmatrixtangent
MatrixsignfunctionMatrixsquarerootEvaluatematrixfunction
CreatingMatrices
AlsoseeNumPy
>>>A=np.matrix(np.random.random((2,2)))
>>>B=np.asmatrix(b)
>>>C=np.mat(np.random.random((10,5)))
>>>D=np.mat([[3,4],[5,6]])
InverseInverse
Tranposematrix
ConjugatetranspositionTrace
Frobeniusnorm
L1norm(maxcolumnsum)Linfnorm(maxrowsum)
MatrixrankDeterminant
SolverfordensematricesSolverfordensematrices
Least-squaressolutiontolinearmatrixequation
Computethepseudo-inverseofamatrix(least-squaressolver)
Computethepseudo-inverseofamatrix(SVD)
Inverse
>>>A.I
>>>linalg.inv(A)
>>>A.T
>>>A.H
>>>np.trace(A)
Norm
>>>linalg.norm(A)
>>>linalg.norm(A,1)
>>>linalg.norm(A,np.inf)
Rank
>>>np.linalg.matrix_rank(C)
Determinant
>>>linalg.det(A)
Solvinglinearproblems
>>>linalg.solve(A,b)
>>>E=np.mat(a).T
>>>linalg.lstsq(D,E)
Generalizedinverse
>>>linalg.pinv(C)
>>>linalg.pinv2(C)
>>>np.mgrid[0:5,0:5]
Createadensemeshgrid
>>>np.ogrid[0:2,0:2]
Createanopenmeshgrid
>>>np.r_[[3,[0]*5,-1:1:10j]
Stackarraysvertically(row-wise)
>>>np.c_[b,c]
Createstackedcolumn-wisearrays
BasicMatrixRoutines
InteractingWithNumPy
AlsoseeNumPy
IndexTricks
ShapeManipulation
Polynomials
VectorizingFunctions
TypeHandling
OtherUsefulFunctions
ReturntheangleofthecomplexargumentCreateanarrayofevenlyspacedvalues
(numberofsamples)
Unwrap
Createanarrayofevenlyspacedvalues(logscale)Returnvaluesfromalistofarraysdependingonconditions
Factorial
CombineNthingstakenatktimeWeightsforNp-pointcentralderivative
Findthen-thderivativeofafunctionatapoint
>>>np.angle(b,deg=True)
>>>g=np.linspace(0,np.pi,num=5)
>>>g[3:]+=np.pi
>>>np.unwrap(g)
>>>np.logspace(0,10,3)
>>>np.select([c<4],[c*2])
>>>misc.factorial(a)
>>>b(10,3,exact=True)
>>>misc.central_diff_weights(3)
>>>misc.derivative(myfunc,1.0)
ReturntherealpartofthearrayelementsReturntheimaginarypartofthearrayelementsReturnarealarrayifcomplexpartscloseto0Castobjecttoadatatype
>>>np.real(c)
>>>np.imag(c)
>>>np.real_if_close(c,tol=1000)
>>>np.cast['f'](np.pi)
Vectorizefunctions
>>>defmyfunc(a):
ifa<0:returna*2
else:
returna/2
>>>np.vectorize(myfunc)
Createapolynomialobject
>>>fromnumpyimportpoly1d
>>>p=poly1d([3,4,5])
>>>importnumpyasnp
>>>a=np.array([1,2,3])
>>>b=np.array([(1+5j,2j,3j),(4j,5j,6j)])
>>>c=np.array([[(1.5,2,3),(4,5,6)],[(3,2,1),(4,5,6)]])
Createa2X2identitymatrixCreatea2x2identitymatrix
CompressedSparseRowmatrixCompressedSparseColumnmatrixDictionaryOfKeysmatrix
Sparsematrixtofullmatrix
Identifysparsematrix
>>>F=np.eye(3,k=1)
>>>G=np.mat(np.identity(2))
>>>C[C>0.5]=0
>>>H=sparse.csr_matrix(C)
>>>I=sparse.csc_matrix(D)
>>>J=sparse.dok_matrix(A)
>>>E.todense()
>>>sparse.isspmatrix_csc(A)
>>>np.transpose(b)
Permutearraydimensions
>>>b.flatten()
Flattenthearray
>>>np.hstack((b,c))
Stackarrayshorizontally(column-wise)
>>>np.vstack((a,b))
Stackarraysvertically(row-wise)
>>>np.hsplit(c,2)
Splitthearrayhorizontallyatthe2ndindex
>>>np.vpslit(d,2)
Splitthearrayverticallyatthe2ndindex
CreatingSparseMatrices
Decompositions
InverseNorm
Solverforsparsematrices
Inverse
>>>sparse.linalg.inv(I)
Norm
>>>sparse.linalg.norm(I)
Solvinglinearproblems
>>>sparse.linalg.spsolve(H,I)
SolveordinaryorgeneralizedeigenvalueproblemforsquarematrixUnpackeigenvalues
FirsteigenvectorSecondeigenvectorUnpackeigenvalues
SingularValueDecomposition(SVD)ConstructsigmamatrixinSVD
LUDecomposition
EigenvaluesandEigenvectors
>>>la,v=linalg.eig(A)
>>>l1,l2=la
>>>v[:,0]
>>>v[:,1]
>>>linalg.eigvals(A)
SingularValueDecomposition
>>>U,s,Vh=linalg.svd(B)
>>>M,N=B.shape
>>>Sig=linalg.diagsvd(s,M,N)
LUDecomposition
>>>P,L,U=linalg.lu(C)
SparseMatrixRoutines
SparseMatrixFunctions
SparseMatrixDecompositions
Sparsematrixexponential
>>>sparse.linalg.expm(I)
DataCamp
LearnPythonforDataScienceInteractively
EigenvaluesandeigenvectorsSVD
>>>la,v=sparse.linalg.eigs(F,1)
>>>sparse.linalg.svds(H,2)
AskingForHelp
>>>help(scipy.linalg.diagsvd)
>>>(np.matrix)
PythonForDataScienceCheatSheet
PandasBasics
LearnPythonforDataScienceInteractivelyat
www.DataC
Pandas
ThePandaslibraryisbuiltonNumPyandprovideseasy-to-usedatastructuresanddataanalysistoolsforthePythonprogramminglanguage.
Usethefollowingimportconvention:
>>>importpandasaspd
PandasDataStructures
Series
a
3
b
-5
c
7
d
4
Aone-dimensionallabeledarraycapableofholdinganydatatype
AskingForHelp
>>>help(pd.Series.loc)
Selection
>>>s['b']
-5
>>>df[1:]
Country
India
Brazil
CapitalNewDelhiBrasília
Population1303171035
207847528
Getoneelement
GetsubsetofaDataFrame
Getting
ByPosition
>>>df.iloc([0],[0])
'Belgium'
>>>df.iat([0],[0])
'Belgium'
ByLabel
>>>df.loc([0],['Country'])
'Belgium'
>>>df.at([0],['Country'])
'Belgium'
ByLabel/Position
>>>df.ix[2]
Country Brazil
Capital BrasíliaPopulation207847528
>>>df.ix[:,'Capital']
Brussels
NewDelhi
Brasília
>>>df.ix[1,'Capital']
'NewDelhi'
BooleanIndexing
>>>s[~(s>1)]
>>>s[(s<-1)|(s>2)]
>>>df[df['Population']>1200000000]
Setting
>>>s['a']=6
Selecting,BooleanIndexing&Setting
AlsoseeNumPyArrays
Selectsinglevaluebyrow&column
Dropping
Dropvaluesfromrows(axis=0)Dropvaluesfromcolumns(axis=1)
>>>s.drop(['a','c'])
>>>df.drop('Country',axis=1)
Sort&Rank
SortbylabelsalonganaxisSortbythevaluesalonganaxisAssignrankstoentries
>>>df.sort_index()
>>>df.sort_values(by='Country')
>>>df.rank()
RetrievingSeries/DataFrameInformation
BasicInformation
>>>df.shape
(rows,columns)
>>>df.index
Describeindex
>>>df.columns
DescribeDataFramecolumns
>>>()
InfoonDataFrame
>>>df.count()
Numberofnon-NAvalues
>>>df.sum()
Sumofvalues
>>>df.cumsum()
Cummulativesumofvalues
>>>df.min()/df.max()
Minimum/maximumvalues
>>>df.idxmin()/df.idxmax()
Minimum/Maximumindexvalue
>>>df.describe()
Summarystatistics
>>>df.mean()
Meanofvalues
>>>df.median()
Medianofvalues
Summary
Selectsinglevaluebyrow&columnlabels
Index
>>>s=pd.Series([3,-5,7,4],index=['a','b','c','d'])
DataFrame
Country
Capital
Population
0
Belgium
Brussels
11190846
1
India
NewDelhi
1303171035
2
Brazil
Brasília
207847528
Columns
Applyfunction
Applyfunctionelement-wise
>>>f=lambdax:x*2
>>>df.apply(f)
>>>df.applymap(f)
ApplyingFunctions
Index
Atwo-dimensionallabeleddatastructurewithcolumnsofpotentiallydifferenttypes
Selectsinglerowofsubsetofrows
Selectasinglecolumnofsubsetofcolumns
DataAlignment
InternalDataAlignment
NAvaluesareintroducedintheindicesthatdon’toverlap:
Selectrowsandcolumns
>>>data={'Country':['Belgium','India','Brazil'],
'Capital':['Brussels','NewDelhi','Brasília'],'Population':[11190846,1303171035,207847528]}
>>>df=pd.DataFrame(data,
columns=['Country','Capital','Population'])
I/O
>>>pd.read_csv('file.csv',header=None,nrows=5)
>>>df.to_csv('myDataFrame.csv')
ReadandWritetoCSV
ReadandWritetoExcel
Seriesswherevalueisnot>1
swherevalueis<-1or>2
UsefiltertoadjustDataFrame
SetindexaofSeriessto6
ReadandWritetoSQLQueryorDatabaseTable
>>>fromsqlalchemyimportcreate_engine
>>>engine=create_engine('sqlite:///:memory:')
>>>pd.read_sql("SELECT*FROMmy_table;",engine)
>>>pd.read_sql_table('my_table',engine)
>>>pd.read_sql_query("SELECT*FROMmy_table;",engine)
>>>pd.read_excel('file.xlsx')
>>>pd.to_excel('dir/myDataFrame.xlsx',sheet_name='Sheet1')
Readmultiplesheetsfromthesamefile
>>>xlsx=pd.ExcelFile('file.xls')
>>>df=pd.read_excel(xlsx,'Sheet1')
read_sql()isaconveniencewrapperaroundread_sql_table()and
read_sql_query()
>>>s3=pd.Series([7,-2,3],index=['a','c','d'])
>>>s+s3
a 10.0
b NaN
c 5.0
d 7.0
ArithmeticOperationswithFillMethods
Youcanalsodotheinternaldataalignmentyourselfwiththehelpofthefillmethods:
>>>s.add(s3,fill_value=0)
a
10.0
b
-5.0
c
5.0
d
7.0
>>>s.sub(s3,fill_value=2)
>>>s.div(s3,fill_value=4)
>>>s.mul(s3,fill_value=3)
>>>pd.to_sql('myDf',engine)
DataCamp
LearnPythonforDataScienceInteractively
LoadingTheData
AlsoseeNumPy&Pandas
YourdataneedstobenumericandstoredasNumPyarraysorSciPysparsematrices.Othertypesthatareconvertibletonumericarrays,suchasPandasDataFrame,arealsoacceptable.
>>>importnumpyasnp
>>>X=np.random.random((10,5))
>>>y=np.array(['M','M','F','F','M','F','M','M','F','F','F'])
>>>X[X<0.7]=0
CreateYourModel
SupervisedLearningEstimators
UnsupervisedLearningEstimators
PrincipalComponentAnalysis(PCA)
>>>fromsklearn.decompositionimportPCA
>>>pca=PCA(n_components=0.95)
KMeans
>>>fromsklearn.clusterimportKMeans
>>>k_means=KMeans(n_clusters=3,random_state=0)
LinearRegression
>>>fromsklearn.linear_modelimportLinearRegression
>>>lr=LinearRegression(normalize=True)
SupportVectorMachines(SVM)
>>>fromsklearn.svmimportSVC
>>>svc=SVC(kernel='linear')
NaiveBayes
>>>fromsklearn.naive_bayesimportGaussianNB
>>>gnb=GaussianNB()
KNN
>>>fromsklearnimportneighbors
>>>knn=neighbors.KNeighborsClassifier(n_neighbors=5)
ModelFitting
Fitthemodeltothedata
Fittodata,thentransformit
Fitthemodeltothedata
Supervisedlearning
>>>lr.fit(X,y)
>>>knn.fit(X_train,y_train)
>>>svc.fit(X_train,y_train)
UnsupervisedLearning
>>>k_means.fit(X_train)
>>>pca_model=pca.fit_transform(X_train)
TuneYourModel
Prediction
PredictlabelsPredictlabels
Estimateprobabilityofalabel
Predictlabelsinclusteringalgos
SupervisedEstimators
>>>y_pred=svc.predict(np.random.random((2,5)))
>>>y_pred=lr.predict(X_test)
>>>y_pred=knn.predict_proba(X_test)
UnsupervisedEstimators
>>>y_pred=k_means.predict(X_test)
EvaluateYourModel’sPerformance
ClassificationMetrics
RegressionMetrics
ClusteringMetrics
Cross-Validation
>>>fromsklearn.cross_validationimportcross_val_score
>>>print(cross_val_score(knn,X_train,y_train,cv=4))
>>>print(cross_val_score(lr,X,y,cv=2))
AdjustedRandIndex
>>>fromsklearn.metricsimportadjusted_rand_score
>>>adjusted_rand_score(y_true,y_pred)
Homogeneity
>>>fromsklearn.metricsimporthomogeneity_score
>>>homogeneity_score(y_true,y_pred)
V-measure
>>>fromsklearn.metricsimportv_measure_score
>>>metrics.v_measure_score(y_true,y_pred)
MeanAbsoluteError
>>>fromsklearn.metricsimportmean_absolute_error
>>>y_true=[3,-0.5,2]
>>>mean_absolute_error(y_true,y_pred)
MeanSquaredError
>>>fromsklearn.metricsimportmean_squared_error
>>>mean_squared_error(y_test,y_pred)
R2Score
>>>fromsklearn.metricsimportr2_score
>>>r2_score(y_true,y_pred)
Standardization
EncodingCategoricalFeatures
Normalization
ImputingMissingValues
Binarization
GeneratingPolynomialFeatures
PreprocessingTheData
>>>fromsklearn.preprocessingimportPolynomialFeatures
>>>poly=PolynomialFeatures(5)
>>>poly.fit_transform(X)
>>>fromsklearn.preprocessingimportBinarizer
>>>binarizer=Binarizer(threshold=0.0).fit(X)
>>>binary_X=binarizer.transform(X)
>>>fromsklearn.preprocessingimportImputer
>>>imp=Imputer(missing_values=0,strategy='mean',axis=0)
>>>imp.fit_transform(X_train)
>>>fromsklearn.preprocessingimportNormalizer
>>>scaler=Normalizer().fit(X_train)
>>>normalized_X=scaler.transform(X_train)
>>>normalized_X_test=scaler.transform(X_test)
>>>fromsklearn.preprocessingimportLabelEncoder
>>>enc=LabelEncoder()
>>>y=enc.fit_transform(y)
>>>fromsklearn.preprocessingimportStandardScaler
>>>scaler=StandardScaler().fit(X_train)
>>>standardized_X=scaler.transform(X_train)
>>>standardized_X_test=scaler.transform(X_test)
TrainingAndTestData
>>>fromsklearn.model_selectionimporttrain_test_split
>>>X_train,X_test,y_train,y_test=train_test_split(X,
y,random_state=0)
PythonForDataScienceCheatSheet
AccuracyScore
>>>knn.score(X_test,y_test)
Estimatorscoremethod
>>>fromsklearn.metricsimportaccuracy_score
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 公司轉讓股權合同范本
- 供水搶修承包合同范本
- 業(yè)務外包服務合同范例
- 債務收購合同范例
- 農村房父子贈與合同范例
- 農機具供貨合同范本
- 中國國家合同范本
- 2025年度婚禮現(xiàn)場舞臺搭建與燈光音響租賃服務合同
- 個人租賃車庫合同范本
- 信息托管合同范本
- 《榜樣9》觀后感心得體會二
- 《西安交通大學》課件
- 小學二年級數(shù)學計算題共4165題
- 一氧化碳中毒培訓
- 初二上冊好的數(shù)學試卷
- 廣東省潮州市2024-2025學年九年級上學期期末道德與法治試卷(含答案)
- 突發(fā)公共衛(wèi)生事件衛(wèi)生應急
- 部編版2024-2025學年三年級上冊語文期末測試卷(含答案)
- 門窗安裝施工安全管理方案
- 2024年安徽省高校分類對口招生考試數(shù)學試卷真題
- ISO45001管理體系培訓課件
評論
0/150
提交評論