Beyond Integer indexing

Faced an interesting problem recently

a : (B, S, T)
b : (B, C)  where 0 <= x[i, j] < S

What I want is an array of shape (B, C, T)

a = np.array(
   ...:    [[[0,1,2,3], 
   ...:      [4,5,6,7],
   ...:      [8,9,10,11]],
   ...:     [[0,1,2,3],
   ...:      [4,5,6,7],
   ...:      [8,9,10,11]]])

b = np.array(
   ...:    [[0,2,2],
   ...:     [1,0, 2]])
a.shape
Out[79]: (2, 3, 4)

b.shape
Out[80]: (2, 3)

What I expect is this

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11],
        [ 8,  9, 10, 11]],
       [[ 4,  5,  6,  7],
        [ 0,  1,  2,  3],
        [ 8,  9, 10, 11]]])

Note this is different from the typical scenario

Initially I hit some issues with integer index broadcasting. It seems it is possible to do it.

a[np.array([np.arange(2)]).T, b]

References:

tiling and repeating tensors

Repeat entire tensor:

Repeat elements of the tensor

 

Frequency Counting in Python.

One of the most frequent operations when doing data analysis is looking at the frequency counts information.

I wanted to list down the various ways of doing this task:

  • using python collections: Counter and Defaultdict
  • using numpy
    • with numpy.unique, with return_counts argument
    • with bincount, nonzero, zip / vstack
  • using pandas
  • using scipy

 

References:

Code:

 

 

Getting started with Numpy

One of the best getting started guides on Numpy is the stanford tutorial.

http://cs231n.github.io/python-numpy-tutorial/

For numpy broadcasting, this is a great guide:

 

References: