Python itself, like any language, is fairly limited in what it can do. The real power of using Python for machine learning and data mining and data science is the power of all the external libraries that are available for it for that purpose. One of those libraries is called NumPy
, or numeric Python, and, for example, here we can import
the Numpy
package, which is included with Canopy as np
.
Sure enough, I get different results. That's pretty cool.
Let's move on to data structures. If you need to pause and let things sink in a little bit, or you want to play around with these a little bit more, feel free to do so. The best way to learn this stuff is to dive in and actually experiment, so I definitely encourage doing that, and that's why I'm giving you working IPython/Jupyter Notebooks, so you can actually go in, mess with the code, do different stuff with it.
For example, here we have a distribution around 25.0
, but let's make it around 55.0
:
import numpy as np
A = np.random.normal(55.0, 5.0, 10)
print (A)
Hey, all my numbers changed, they're closer to 55 now, how about that?
Alright, let's talk about data structures a little bit here. As we saw in our first example, you can have a list, and the syntax looks like this.
x = [1, 2, 3, 4, 5, 6]
print (len(x))
You can say, call a list x
, for example, and assign it to the numbers 1
through 6
, and these square brackets indicate that we are using a Python list, and those are immutable objects that I can actually add things to and rearrange as much as I want to. There's a built-in function for determining the length of the list called len
, and if I type in len(x)
, that will give me back the number 6
because there are 6 numbers in my list.
Just to make sure, and again to drive home the point that this is actually running real code here, let's add another number in there, such as 4545
. If you run this, you'll get 7
because now there are 7 numbers in that list:
x = [1, 2, 3, 4, 5, 6, 4545]
print (len(x))
The output of the previous code example is as follows:
7
Go back to the original example there. Now you can also slice lists. If you want to take a subset of a list, there's a very simple syntax for doing so:
x[3:]
The output of the above code example is as follows:
[1, 2, 3]
If, for example, you want to take the first three elements of a list, everything before element number 3, we can say :3
to get the first three elements, 1
, 2
, and 3
, and if you think about what's going on there, as far as indices go, like in most languages, we start counting from 0. So element 0 is 1
, element 1 is 2
, and element 2 is 3
. Since we're saying we want everything before element 3, that's what we're getting.
Note
So, you know, never forget that in most languages, you start counting at 0 and not 1.
Now this can confuse matters, but in this case, it does make intuitive sense. You can think of that colon as meaning I want everything, I want the first three elements, and I could change that to four just again to make the point that we're actually doing something real here:
x[:4]
The output of the above code example is as follows:
[1, 2, 3, 4]
Now if I put the colon on the other side of the 3
, that says I want everything after 3
, so 3
and after. If I say x[3:]
, that's giving me the third element, 0, 1, 2, 3, and everything after it. So that's going to return 4, 5, and 6 in that example, OK?
x[3:]
The output is as follows:
[4, 5, 6]
You might want to keep this IPython/Jupyter Notebook file around. It's a good reference, because sometimes it can get confusing as to whether the slicing operator includes that element or if it's up to or including it or not. So the best way is to just play around with it here and remind yourself.
One more thing you can do is have this negative syntax:
x[-2:]
The output is as follows:
[5, 6]
By saying x[-2:]
, this means that I want the last two elements in the list. This means that go backwards two from the end, and that will give me 5
and 6
, because those are the last two things on my list.
You can also change lists around. Let's say I want to add a list to the list. I can use the extend
function for that, as shown in the following code block:
x.extend([7,8])
x
The output of the above code is as follows:
[1, 2, 3, 4, 5, 6, 7, 8]
I have my list of 1
, 2
, 3
, 4
, 5
, 6
. If I want to extend it, I can say I have a new list here, [7, 8]
, and that bracket indicates this is a new list of itself. This could be a list implicit, you know, that's inline there, it could be referred to by another variable. You can see that once I do that, the new list I get actually has that list of 7
, 8
appended on to the end of it. So I have a new list by extending that list with another list.
If you want to just add one more thing to that list, you can use the append
function. So I just want to stick the number 9
at the end, there we go:
x.append(9)
x
The output of the above code is as follows:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
You can also have complex data structures with lists. So you don't have to just put numbers in it; you can actually put strings in it. You can put numbers in it. You can put other lists in it. It doesn't matter. Python is a weakly-typed language, so you can pretty much put whatever kind of data you want, wherever you want, and it will generally be an OK thing to do:
y = [10, 11, 12]
listOfLists = [x, y]
listOfLists
In the preceding example, I have a second list that contains 10
, 11
, 12
, that I'm calling y
. I'll create a new list that contains two lists. How's that for mind blowing? Our listofLists
list will contain the x
list and the y
list, and that's a perfectly valid thing to do. You can see here that we have a bracket indicating the listofLists
list, and within that, we have another set of brackets indicating each individual list that is in that list:
[[ 1, 2, 3, 4, 5, 6, 7, 8, 9 ], [10, 11, 12]]
So, sometimes things like these will come in handy.
Dereferencing a single element
If you want to dereference a single element of the list you can just use the bracket like that:
y[1]
The output of the above code is as follows:
11
So y[1]
will return element 1
. Remember that y
had 10
, 11
, 12
in it - observe the previous example, and we start counting from 0, so element 1 will actually be the second element in the list, or the number 11
in this case, alright?
Finally, let's have a built-in sort function that you can use:
z = [3, 2, 1]
z.sort()
z
So if I start with list z
, which is 3,
2
, and 1
, I can call sort on that list, and z
will now be sorted in order. The output of the above code is as follows:
[1, 2, 3]
z.sort(reverse=True)
z
The output of the above code is as follows:
[3, 2, 1]
If you need to do a reverse sort, you can just say reverse=True
as an attribute, as a parameter in that sort
function, and that will put it back to 3
, 2
, 1
.
If you need to let that sink in a little bit, feel free to go back and read it a little bit more.
Tuples are just like lists, except they're immutable, so you can't actually extend, append, or sort them. They are what they are, and they behave just like lists, apart from the fact that you can't change them, and you indicate that they are immutable and are tuple, as opposed to a list, using parentheses instead of a square bracket. So you can see they work pretty much the same way otherwise:
#Tuples are just immutable lists. Use () instead of []
x = (1, 2, 3)
len(x)
The output of the previous code is as follows:
3
We can say x= (1, 2, 3)
. I can still use length - len
on that to say that there are three elements in that tuple, and even though, if you're not familiar with the term tuple
, a tuple
can actually contain as many elements as you want. Even though it sounds like it's Latin based on the number three, it doesn't mean you have three things in it. Usually, it only has two things in it. They can have as many as you want, really.
We can also dereference the elements of a tuple, so element number 2 again would be the third element, because we start counting from 0, and that will give me back the number 6
in the following screenshot:
y = (4, 5, 6)
y[2]
The output to the above code is as follows:
6
We can also, like we could with lists, use tuples as elements of a list.
listOfTuples = [x, y]
listOfTuples
The output to the above code is as follows:
[(1, 2, 3), (4, 5, 6)]
We can create a new list that contains two tuples. So in the preceding example, we have our x
tuple of (1, 2, 3)
and our y
tuple of (4, 5, 6)
; then we make a list of those two tuples and we get back this structure, where we have square brackets indicating a list that contains two tuples indicated by parentheses, and one thing that tuples are commonly used for when we're doing data science or any sort of managing or processing of data really is to use it to assign variables to input data as it's read in. I want to walk you through a little bit on what's going on in the following example:
(age, income) = "32,120000".split(',')
print (age)
print (income)
The output to the above code is as follows:
32
120000
Let's say we have a line of input data coming in and it's a comma-separated value file, which contains ages, say 32
, comma-delimited by an income, say 120000
for that age, just to make something up. What I can do is as each line comes in, I can call the split
function on it to actually separate that into a pair of values that are delimited by commas, and take that resulting tuple that comes out of split and assign it to two variables-age
and income
-all at once by defining a tuple of age, income and saying that I want to set that equal to the tuple that comes out of the split
function.
So this is basically a common shorthand you'll see for assigning multiple fields to multiple variables at once. If I run that, you can see that the age
variable actually ends up assigned to 32
and income
to 120,000
because of that little trick there. You do need to be careful when you're doing this sort of thing, because if you don't have the expected number of fields or the expected number of elements in the resulting tuple, you will get an exception if you try to assign more stuff or less stuff than you expect to see here.
Finally, the last data structure that we'll see a lot in Python is a dictionary, and you can think of that as a map or a hash table in other languages. It's a way to basically have a sort of mini-database, sort of a key/value data store that's built into Python. So let's say, I want to build up a little dictionary of Star Trek ships and their captains:
I can set up a captains = {}
, where curly brackets indicates an empty dictionary. Now I can use this sort of a syntax to assign entries in my dictionary, so I can say captains
for Enterprise
is Kirk
, for Enterprise D
it is Picard
, for Deep Space Nine
it is Sisko
, and for Voyager
it is Janeway
. Now I have, basically, this lookup table that will associate ship names with their captain, and I can say, for example, print captains["Voyager"]
, and I get back Janeway
.
A very useful tool for basically doing lookups of some sort. Let's say you have some sort of an identifier in a dataset that maps to some human-readable name. You'll probably be using a dictionary to actually do that look up when you're printing it out.
We can also see what happens if you try to look up something that doesn't exist. Well, we can use the get
function on a dictionary to safely return an entry. So in this case, Enterprise
does have an entry in my dictionary, it just gives me back Kirk
, but if I call the NX-01
ship on the dictionary, I never defined the captain of that, so it comes back with a None
value in this example, which is better than throwing an exception, but you do need to be aware that this is a possibility:
print (captains.get("NX-01"))
The output of the above code is as follows:
None
The captain is Jonathan Archer, but you know, I'm get a little bit too geeky here now.