Numpy Quick Start

参考:

https://docs.scipy.org/doc/numpy/user/quickstart.html

https://docs.scipy.org/doc/numpy-dev/reference/arrays.ndarray.html

https://github.com/familyld/learnpython/blob/master/Numpy_Learning.md

Numpy 中的数组类叫做 ndarray,顾名思义就是是 n 维数组。其中一些重要的属性

ndarray.ndim 维度

ndarray.shape 形状 比如 一个 3x3 的 2 维数组 返回 (3, 3)

ndarray.size ndarray 中的元素总数

ndarray.dtype 数据类型 比如 numpy.int32, numpy.int16, numpy.float64

ndarray.itemsize 每一个元素的 字节数 等价于 ndarray.dtype.itemsize.

ndarray.data 实际存储 ndarray 内容的内存 一般不使用

ndarray.T 返回转置

flat 返回一个数组的迭代器,对此迭代器赋值将导致整个数组元素被覆盖

real/imag 返回复数数组的实部/虚部数组

nbytes 数组占用的字节数

ndarray.base base array 如果是其他 array 的 view

ndarray.flags 关于 array 内存的一些信息

>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<type 'numpy.ndarray'>

>>> b.flat
<numpy.flatiter at 0x20456dd3670>
>>> b.flat = 1
>>> b
array([1, 1, 1])
>>> b.flat = [1,2]
>>> b
array([1, 2, 1])

Basics

数组创建

创建数组有与多种不同的方法

从 python list 或 tuple 利用 array 创建,数据格式会自动推断

>>> import numpy as np
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> a.dtype
dtype('int64')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')

不要用多个数字来调用 array

>>> a = np.array(1,2,3,4)    # WRONG
>>> a = np.array([1,2,3,4])  # RIGHT

array 自动将 二维或多维序列 转换为多维数组

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> b
array([[ 1.5,  2. ,  3. ],
       [ 4. ,  5. ,  6. ]])

另外可以手动指定数组元素的类型

>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

很多时候,数组的元素是未知的,但数组的大小已知,numpy 提供了很多创建带初始值数组的函数。这可以最小化改变数组大小的操作,因为那样很慢。

比如 ones,zeros 还有 empty,empty 创建的数组内所包含的数是随机的,取决于内存块当前的状态。默认的 dtype 是 float64

>>> np.zeros( (3,4) )
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
>>> np.ones( (2,3,4), dtype=np.int16 )                # dtype can also be specified
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) )                                 # uninitialized, output may vary
array([[  3.73603959e-262,   6.02658058e-154,   6.55490914e-260],
       [  5.30498948e-313,   3.14673309e-307,   1.00000000e+000]])

Numpy 提供了一个类似于 range 的函数,arange 同样接受 start,stop,step 参数

>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 )                 # it accepts float arguments
array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])

值得注意的是,当 arange 接受 float 参数的时候,由于浮点数固有的精度限制,有时候结果往往并不让人满意,为了改进这一点,一个更好的办法是使用 linspace 函数

>>> from numpy import pi
>>> np.linspace( 0, 2, 9 )                 # 9 numbers from 0 to 2
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> x = np.linspace( 0, 2*pi, 100 )        # useful to evaluate function at lots of points
>>> f = np.sin(x)

see also:

array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, numpy.random.rand, numpy.random.randn, fromfunction, fromfile

打印数组

打印数组的时候,显示方式基本上和 nested list 一样,但是会做一些调整以展示数组的维度

>>> a = np.arange(6)                         # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3)           # 2d array
>>> print(b)
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4)         # 3d array
>>> print(c)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]
 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

如果 数组过大,自动跳过中间部分

如果想要强制打印整个数组,可以使用 set_printoptions.

>>> np.set_printoptions(threshold=np.nan)

基本操作

对数组进行的一些数学操作,会自动转换为 对每一个元素的操作,之后产生一个新的 array

但是在 numpy 中 * 是元素乘法,如果项进行矩阵乘法,需要使用 dot

>>> A = np.array( [[1,1],
...             [0,1]] )
>>> B = np.array( [[2,0],
...             [3,4]] )
>>> A*B                         # elementwise product
array([[2, 0],
       [0, 4]])
>>> A.dot(B)                    # matrix product
array([[5, 4],
       [3, 4]])
>>> np.dot(A, B)                # another matrix product
array([[5, 4],
       [3, 4]])

+= *= 之类的操作 会就地操作原数组,而不是产生一个新的数组,这很好理解。

很多 array 层面的操作,比如求和等被当作 ndarray 的方法实现。 例如 sum min max 等

这些操作默认是 array 层面的,但是你可以手动指定 axis 参数来对行或者列进行操作。

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> b.sum(axis=0)                            # sum of each column
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1)                            # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1)                         # cumulative sum along each row
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])

通用函数

有很多函数比如 sin cos exp sqrt add,这被称作 ufunc,numpy 中这样的函数是 elementwise 的,返回一个数组。

See also:

all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where

index slice 和 iterating

一维数组的 index slice 还有 iterating 和 python list 并没有什么区别。

多维数组的每一维度都有一个 index,以元祖形式传递。当有些维度的 index 没有给出的时候,认为是全部 :.

>>> def f(x,y):
...     return 10*x+y
...
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[0:5, 1]                       # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[ : ,1]                        # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ]                      # each column in the second and third row of b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])
>>> b[-1]                                  # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])

b[i] 也可以写成 b[i,...], ... 可以智能的代表任意个 :

对多维数组的迭代,被认为是对第一维度的迭代。

如果想对整个数组的所有元素进行迭代,可以使用 flat 属性,flat 返回一个原数组的扁平化迭代器。

>>> for element in b.flat:
...     print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43

See also:

Indexing, Indexing (reference), newaxis, ndenumerate, indices

形状操作 shape manipulation

改变数组的形状,以下三个命令都返回一个新的 array 而不会改变原数组

>>> a.ravel()  # returns the array, flattened
array([ 2.,  8.,  0.,  6.,  4.,  5.,  1.,  1.,  8.,  9.,  3.,  6.])
>>> a.reshape(6,2)  # returns the array with a modified shape
array([[ 2.,  8.],
       [ 0.,  6.],
       [ 4.,  5.],
       [ 1.,  1.],
       [ 8.,  9.],
       [ 3.,  6.]])
>>> a.T  # returns the array, transposed
array([[ 2.,  4.,  8.],
       [ 8.,  5.,  9.],
       [ 0.,  1.,  3.],
       [ 6.,  1.,  6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)

ravel 返回的是原 array 的一个 view,不会占用内存,但 view 的核心数据改变会影响原 array,flatten 返回一个副本

reshape 函数返回一个调整后的数组,而 resize 函数则原地操作数组本身 由于 resize 是 inplace 操作,所以有一个 reference check 机制,可以用 refcheck = False 取消。

如果其中一个维度参数给的是 -1,那么 numpy 会自动计算维数

See also:

ndarray.shape, reshape, resize, ravel

组合数组

可以使用 vstack 和 hstack 在不同方向组合不同的数组。

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8.,  8.],
       [ 0.,  0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1.,  8.],
       [ 0.,  4.]])
>>> np.vstack((a,b))
array([[ 8.,  8.],
       [ 0.,  0.],
       [ 1.,  8.],
       [ 0.,  4.]])
>>> np.hstack((a,b))
array([[ 8.,  8.,  1.,  8.],
       [ 0.,  0.,  0.,  4.]])

column_stack 把 1D array 作为列组合成一个 2D array。对于 2D arrays 是和 hstack 一样的。

>>> from numpy import newaxis
>>> np.column_stack((a,b))     # with 2D arrays
array([[ 8.,  8.,  1.,  8.],
      [ 0.,  0.,  0.,  4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b))     # returns a 2D array
array([[ 4., 3.],
      [ 2., 8.]])
>>> np.hstack((a,b))           # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis]               # this allows to have a 2D columns vector
array([[ 4.],
      [ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4.,  3.],
      [ 2.,  8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis]))   # the result is the same
array([[ 4.,  3.],
      [ 2.,  8.]])

另一方面 row_stack 对任何输入数组来说都和 vstack 一样。简单的说,hstack 按第二 index 来拼接,而 vstack 按第一轴拼接。concatenate 可以用来指定需要沿第几轴拼接。

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
       [3, 4, 6]])

注意:

r_ 和 c_ 在构建数组的时候也很有用,默认行为类似于 vstack 和 hstack,但是可以可选参数指定沿哪一轴拼接。

>>> np.r_[1:4,0,4]
array([1, 2, 3, 0, 4])

See also:

hstack, vstack, column_stack, concatenate, c_, r_

拆分数组

使用 hsplit,可以沿水平轴拆分数组,或者指定需要返回几个数组,也可以指定在哪些列进行拆分。

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9.,  5.,  6.,  3.,  6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 1.,  4.,  9.,  2.,  2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])
>>> np.hsplit(a,3)   # Split a into 3
[array([[ 9.,  5.,  6.,  3.],
       [ 1.,  4.,  9.,  2.]]), array([[ 6.,  8.,  0.,  7.],
       [ 2.,  1.,  0.,  6.]]), array([[ 9.,  7.,  2.,  7.],
       [ 2.,  2.,  4.,  0.]])]
>>> np.hsplit(a,(3,4))   # Split a after the third and the fourth column
[array([[ 9.,  5.,  6.],
       [ 1.,  4.,  9.]]), array([[ 3.],
       [ 2.]]), array([[ 6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],
       [ 2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])]

vsplit 沿着垂直轴进行拆分,array_split 允许指定沿着哪个轴拆分。

复制和 view

如下的赋值是不会有任何的深度复制的,python 通过引用传递可变对象,函数调用也不会复制。

>>> a = np.arange(12)
>>> b = a            # no new object is created
>>> b is a           # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4    # changes the shape of a
>>> a.shape
(3, 4)

>>> def f(x):
...     print(id(x))
...
>>> id(a)                           # id is a unique identifier of an object
148293216
>>> f(a)
148293216

view 和浅复制

不同的数组可以共享数据,view 方法产生一个数组核心数据的引用。改变 view 的属性值不改变原数组的属性,但改变核心数据会影响原 array。

>>> c = a.view()
>>> c is a
False
>>> c.base is a                        # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6                      # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234                      # a's data changes
>>> a
array([[   0,    1,    2,    3],
       [1234,    5,    6,    7],
       [   8,    9,   10,   11]])

对 array 进行切片返回一个 view

>>> s = a[ : , 1:3]     # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10           # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

深复制

copy 方法返回一个完全的拷贝。

>>> d = a.copy()                          # a new array object with new data is created
>>> d is a
False
>>> d.base is a                           # d doesn't share anything with a
False
>>> d[0,0] = 9999
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

函数和方法总览

Array Creation

    arange, array, copy, empty, empty_like, eye, fromfile, fromfunction, identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r, zeros, zeros_like

Conversions

    ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat

Manipulations

    array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit, hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes, take, transpose, vsplit, vstack

Questions

    all, any, nonzero, where

Ordering

    argmax, argmin, argsort, max, min, ptp, searchsorted, sort

Operations

    choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask, real, sum

Basic Statistics

    cov, mean, std, var

Basic Linear Algebra

    cross, dot, outer, linalg.svd, vdot

进阶

Broadcasting Rules

就是只能转换为 elementwise 操作

index tricks

第一种方法:用 array 作为 index

>>> a = np.arange(12)**2                       # the first 12 square numbers
>>> i = np.array( [ 1,1,3,8,5 ] )              # an array of indices
>>> a[i]                                       # the elements of a at the positions i
array([ 1,  1,  9, 64, 25])
>>>
>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] )      # a bidimensional array of indices
>>> a[j]                                       # the same shape as j
array([[ 9, 16],
       [81, 49]])

甚至可以用 两个 array 作为 index 实现双重选择

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array( [ [0,1],                        # indices for the first dim of a
...                 [1,2] ] )
>>> j = np.array( [ [2,1],                        # indices for the second dim
...                 [3,3] ] )
>>>
>>> a[i,j]                                     # i and j must have equal shape
array([[ 2,  5],
       [ 7, 11]])
>>>
>>> a[i,2]
array([[ 2,  6],
       [ 6, 10]])
>>>
>>> a[:,j]                                     # i.e., a[ : , j]
array([[[ 2,  1],
        [ 3,  3]],
       [[ 6,  5],
        [ 7,  7]],
       [[10,  9],
        [11, 11]]])

一个例子

>>> time = np.linspace(20, 145, 5)                 # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4)      # 4 time-dependent series
>>> time
array([  20.  ,   51.25,   82.5 ,  113.75,  145.  ])
>>> data
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ,  0.6569866 ],
       [ 0.98935825,  0.41211849, -0.54402111, -0.99999021],
       [-0.53657292,  0.42016704,  0.99060736,  0.65028784],
       [-0.28790332, -0.96139749, -0.75098725,  0.14987721]])
>>>
>>> ind = data.argmax(axis=0)                  # index of the maxima for each series
>>> ind
array([2, 0, 3, 1])
>>>
>>> time_max = time[ind]                       # times corresponding to the maxima
>>>
>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
>>>
>>> time_max
array([  82.5 ,   20.  ,  113.75,   51.25])
>>> data_max
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])
>>>
>>> np.all(data_max == data.max(axis=0))
True

也可以把 array indexed 数组作为赋值对象,但是如果 index 重复出现则以最后一次为准,注意 array 在 python 中的 += 方法可能会出现意想不到的结果。

>>> a = np.arange(5)
>>> a[[0,0,2]]+=1
>>> a
array([1, 1, 3, 3, 4])

虽然 index 0 出现了两次,但是只会增加一次,因为 a+=1 等同于 a = a + 1.

第二种方法:用 boolean 选择器作为 index,确保长度不要越界

>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True])             # first dim selection
>>> b2 = np.array([True,False,True,False])       # second dim selection
>>>
>>> a[b1,:]                                   # selecting rows
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> a[b1]                                     # same thing
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> a[:,b2]                                   # selecting columns
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])
>>>
>>> a[b1,b2]                                  # a weird thing to do
array([ 4, 10])

一个产生 mandelbrot set 的例子

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> def mandelbrot( h,w, maxit=20 ):
...     """Returns an image of the Mandelbrot fractal of size (h,w)."""
...     y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
...     c = x+y*1j
...     z = c
...     divtime = maxit + np.zeros(z.shape, dtype=int)
...
...     for i in range(maxit):
...         z = z**2 + c
...         diverge = z*np.conj(z) > 2**2            # who is diverging
...         div_now = diverge & (divtime==maxit)  # who is diverging now
...         divtime[div_now] = i                  # note when
...         z[diverge] = 2                        # avoid diverging too much
...
...     return divtime
>>> plt.imshow(mandelbrot(400,400))
>>> plt.show()

ix_() 函数

ix_() 用以组合 不同的 array 以产生 任意 n-uplet 的结果,也就是说在几个 array 里分别取一个之后运算的结果。比如计算 a+b*c

>>> a = np.array([2,3,4,5])
>>> b = np.array([8,5,4])
>>> c = np.array([5,4,6,8,3])
>>> ax,bx,cx = np.ix_(a,b,c)
>>> ax
array([[[2]],
       [[3]],
       [[4]],
       [[5]]])
>>> bx
array([[[8],
        [5],
        [4]]])
>>> cx
array([[[5, 4, 6, 8, 3]]])
>>> ax.shape, bx.shape, cx.shape
((4, 1, 1), (1, 3, 1), (1, 1, 5))
>>> result = ax+bx*cx
>>> result
array([[[42, 34, 50, 66, 26],
        [27, 22, 32, 42, 17],
        [22, 18, 26, 34, 14]],
       [[43, 35, 51, 67, 27],
        [28, 23, 33, 43, 18],
        [23, 19, 27, 35, 15]],
       [[44, 36, 52, 68, 28],
        [29, 24, 34, 44, 19],
        [24, 20, 28, 36, 16]],
       [[45, 37, 53, 69, 29],
        [30, 25, 35, 45, 20],
        [25, 21, 29, 37, 17]]])
>>> result[3,2,4]
17
>>> a[3]+b[2]*c[4]
17

可以这样实现 reduce

>>> def ufunc_reduce(ufct, *vectors):
...    vs = np.ix_(*vectors)
...    r = ufct.identity
...    for v in vs:
...        r = ufct(r,v)
...    return r

>>> ufunc_reduce(np.add,a,b,c)
array([[[15, 14, 16, 18, 13],
        [12, 11, 13, 15, 10],
        [11, 10, 12, 14,  9]],
       [[16, 15, 17, 19, 14],
        [13, 12, 14, 16, 11],
        [12, 11, 13, 15, 10]],
       [[17, 16, 18, 20, 15],
        [14, 13, 15, 17, 12],
        [13, 12, 14, 16, 11]],
       [[18, 17, 19, 21, 16],
        [15, 14, 16, 18, 13],
        [14, 13, 15, 17, 12]]])

此版本的 reduce 和 ufunc.reduce 相比的优点是利用 broadcasting rules 从而避免了中间变量的产生。

线性代数

简单的线代操作

>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[ 1.  2.]
 [ 3.  4.]]

>>> a.transpose()
array([[ 1.,  3.],
       [ 2.,  4.]])

>>> np.linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])

>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1.,  0.],
       [ 0.,  1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])

>>> np.dot (j, j) # matrix product
array([[-1.,  0.],
       [ 0., -1.]])

>>> np.trace(u)  # trace
2.0

>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
array([[-3.],
       [ 4.]])

>>> np.linalg.eig(j)
(array([ 0.+1.j,  0.-1.j]), array([[ 0.70710678+0.j        ,  0.70710678-0.j        ],
       [ 0.00000000-0.70710678j,  0.00000000+0.70710678j]]))

tricks and tips

自动 reshape

>>> a = np.arange(30)
>>> a.shape = 2,-1,3  # -1 means "whatever is needed"
>>> a.shape
(2, 5, 3)
>>> a
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11],
        [12, 13, 14]],
       [[15, 16, 17],
        [18, 19, 20],
        [21, 22, 23],
        [24, 25, 26],
        [27, 28, 29]]])

vector 拼接

前面提到过

热力图

numpy histogram 函数以 array 为输入,输出一个 hitogram 向量和一个 bin 向量。matplotlib 也有热力图函数 hist 和numpy 中的不一样,主要区别是 hist 自动画出热力图而 numpy.histogram 只是产生数据。

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> # Build a vector of 10000 normal deviates with variance 0.5^2 and mean 2
>>> mu, sigma = 2, 0.5
>>> v = np.random.normal(mu,sigma,10000)
>>> # Plot a normalized histogram with 50 bins
>>> plt.hist(v, bins=50, normed=1)       # matplotlib version (plot)
>>> plt.show()
>>> # Compute the histogram with numpy and then plot it
>>> (n, bins) = np.histogram(v, bins=50, normed=True)  # NumPy version (no plot)
>>> plt.plot(.5*(bins[1:]+bins[:-1]), n)
>>> plt.show()

其他东西

自定义 dtype

创建 array 时自定义 dtype 类型,也可以包含 str 类型比如

n_drops = 5
rain_drops = np.zeros(n_drops, dtype=[('position', float, 2),
                                      ('size', float, 1),
                                      ('growth', float, 1),
                                      ('color', float, 4),
                                      ('name', str, 1)])
>>> rain_drops
array([([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], ''),
       ([0., 0.], 0., 0., [0., 0., 0., 0.], '')],
      dtype=[('position', '<f8', (2,)), ('size', '<f8'), ('growth', '<f8'), ('color', '<f8', (4,)), ('name', '<U1')])
>>> rain_drops[0]
([0., 0.], 0., 0., [0., 0., 0., 0.], '')
>>> rain_drops[0]['position']
array([0., 0.])

一些统计函数

In [109]: np.sum(arr11)   #计算所有元素的和
Out[109]: -18

In [110]: np.sum(arr11,axis = 0)    #对每一列求和,注意axis是0
Out[110]: array([ -2,  -6, -10])

In [111]: np.sum(arr11, axis = 1)     #对每一行求和,注意axis是1
Out[111]: array([  9,   0,  -9, -18])

In [112]: np.cumsum(arr11) #对每一个元素求累积和(从上到下,从左到右的元素顺序),即每移动一次就把当前数字加到和值
Out[112]: array([  4,   7,   9,  10,  10,   9,   7,   4,   0,  -5, -11, -18], dtype=int32)

In [113]: np.cumsum(arr11, axis = 0) #计算每一列的累积和,并返回二维数组
Out[113]:
array([[  4,   3,   2],
[  5,   3,   1],
[  3,   0,  -3],
[ -2,  -6, -10]], dtype=int32)

In [114]: np.cumprod(arr11, axis = 1) #计算每一行的累计积,并返回二维数组
Out[114]:
array([[   4,   12,   24],
[   1,    0,    0],
[  -2,    6,  -24],
[  -5,   30, -210]], dtype=int32)

In [115]: np.min(arr11)   #计算所有元素的最小值
Out[115]: -7

In [116]: np.max(arr11, axis = 0) #计算每一列的最大值
Out[116]: array([4, 3, 2])

In [117]: np.mean(arr11)  #计算所有元素的均值
Out[117]: -1.5

In [118]: np.mean(arr11, axis = 1) #计算每一行的均值
Out[118]: array([ 3.,  0., -3., -6.])

In [119]: np.median(arr11)   #计算所有元素的中位数
Out[119]: -1.5

In [120]: np.median(arr11, axis = 0)   #计算每一列的中位数
Out[120]: array([-0.5, -1.5, -2.5])

In [121]: np.var(arr12)   #计算所有元素的方差
Out[121]: 5.354166666666667

In [122]: np.std(arr12, axis = 1)   #计算每一行的标准差
Out[122]: array([ 2.49443826,  1.88561808,  1.69967317,  2.1602469 ])

另外:

unique(x): 计算x的唯一元素,并返回有序结果 intersect(x,y): 计算x和y的公共元素,即交集 union1d(x,y): 计算x和y的并集 setdiff1d(x,y): 计算x和y的差集,即元素在x中,不在y中 setxor1d(x,y): 计算集合的对称差,即存在于一个数组中,但不同时存在于两个数组中 in1d(x,y): 判断x的元素是否包含于y中

numpy.random 模块

一些常用的 random 函数

rand(d0, d1, ..., dn)               Random values in a given shape.
randn(d0, d1, ..., dn)              Return a sample (or samples) from the “standard normal” distribution.
randint(low[, high, size, dtype])   Return random integers from low (inclusive) to high (exclusive).
random_integers(low[, high, size])  Random integers of type np.int between low and high, inclusive.
random_sample([size])               Return random floats in the half-open interval [0.0, 1.0).
random([size])                      Return random floats in the half-open interval [0.0, 1.0).
ranf([size])                        Return random floats in the half-open interval [0.0, 1.0).
sample([size])                      Return random floats in the half-open interval [0.0, 1.0).
choice(a[, size, replace, p])       Generates a random sample from a given 1-D array
bytes(length)                       Return random bytes.

array conversion

ndarray.item(*args) Copy an element of an array to a standard Python scalar and return it.
ndarray.tolist()    Return the array as a (possibly nested) list.
ndarray.itemset(*args)  Insert scalar into an array (scalar is cast to array’s dtype, if possible)
ndarray.tostring([order])   Construct Python bytes containing the raw data bytes in the array.
ndarray.tobytes([order])    Construct Python bytes containing the raw data bytes in the array.
ndarray.tofile(fid[, sep, format])  Write array to a file as text or binary (default).
ndarray.dump(file)  Dump a pickle of the array to the specified file.
ndarray.dumps() Returns the pickle of the array as a string.
ndarray.astype(dtype[, order, casting, ...])    Copy of the array, cast to a specified type.
ndarray.byteswap(inplace)   Swap the bytes of the array elements
ndarray.copy([order])   Return a copy of the array.
ndarray.view([dtype, type]) New view of array with the same data.
ndarray.getfield(dtype[, offset])   Returns a field of the given array as a certain type.
ndarray.setflags([write, align, uic])   Set array flags WRITEABLE, ALIGNED, and UPDATEIFCOPY, respectively.
ndarray.fill(value) Fill the array with a scalar value.

shape manipulation

ndarray.reshape(shape[, order]) Returns an array containing the same data with a new shape.
ndarray.resize(new_shape[, refcheck])   Change shape and size of array in-place.
ndarray.transpose(*axes)    Returns a view of the array with axes transposed.
ndarray.swapaxes(axis1, axis2)  Return a view of the array with axis1 and axis2 interchanged.
ndarray.flatten([order])    Return a copy of the array collapsed into one dimension.
ndarray.ravel([order])  Return a flattened array.
ndarray.squeeze([axis]) Remove single-dimensional entries from the shape of a.

Item selection and manipulation

ndarray.take(indices[, axis, out, mode])    Return an array formed from the elements of a at the given indices.
ndarray.put(indices, values[, mode])    Set a.flat[n] = values[n] for all n in indices.
ndarray.repeat(repeats[, axis]) Repeat elements of an array.
ndarray.choose(choices[, out, mode])    Use an index array to construct a new array from a set of choices.
ndarray.sort([axis, kind, order])   Sort an array, in-place.
ndarray.argsort([axis, kind, order])    Returns the indices that would sort this array.
ndarray.partition(kth[, axis, kind, order]) Rearranges the elements in the array in such a way that value of the element in kth position is in the position it would be in a sorted array.
ndarray.argpartition(kth[, axis, kind, order])  Returns the indices that would partition this array.
ndarray.searchsorted(v[, side, sorter]) Find indices where elements of v should be inserted in a to maintain order.
ndarray.nonzero()   Return the indices of the elements that are non-zero.
ndarray.compress(condition[, axis, out])    Return selected slices of this array along given axis.
ndarray.diagonal([offset, axis1, axis2])    Return specified diagonals.

caculation

ndarray.argmax([axis, out]) Return indices of the maximum values along the given axis.
ndarray.min([axis, out, keepdims])  Return the minimum along a given axis.
ndarray.argmin([axis, out]) Return indices of the minimum values along the given axis of a.
ndarray.ptp([axis, out])    Peak to peak (maximum - minimum) value along a given axis.
ndarray.clip([min, max, out])   Return an array whose values are limited to [min, max].
ndarray.conj()  Complex-conjugate all elements.
ndarray.round([decimals, out])  Return a with each element rounded to the given number of decimals.
ndarray.trace([offset, axis1, axis2, dtype, out])   Return the sum along diagonals of the array.
ndarray.sum([axis, dtype, out, keepdims])   Return the sum of the array elements over the given axis.
ndarray.cumsum([axis, dtype, out])  Return the cumulative sum of the elements along the given axis.
ndarray.mean([axis, dtype, out, keepdims])  Returns the average of the array elements along given axis.
ndarray.var([axis, dtype, out, ddof, keepdims]) Returns the variance of the array elements, along given axis.
ndarray.std([axis, dtype, out, ddof, keepdims]) Returns the standard deviation of the array elements along given axis.
ndarray.prod([axis, dtype, out, keepdims])  Return the product of the array elements over the given axis
ndarray.cumprod([axis, dtype, out]) Return the cumulative product of the elements along the given axis.
ndarray.all([axis, out, keepdims])  Returns True if all elements evaluate to True.
ndarray.any([axis, out, keepdims])  Returns True if any of the elements of a evaluate to True.