How to prepare for Dataweekends

One of the questions we often receive is  "I don't have any experience with Python, would I able to follow the courses if I take a basic Python course before?". The answer is YES, but let us help you with that.

Dataweekends focus on data techniques and they assume some familiarity with Python and programming languages. Since they only last two days, they cannot be a substitute for a bootcamp or a university degree. They are designed to speed up your learning curve in data science, machine learning or deep learning giving you enough knowledge for you to then be able to continue learning on your own.

Over the years we've trained software engineers, business intelligence and analytics professionals, product managers, phd students and managers. Everyone completed the weekend successfully and they all enjoyed it. Some of them did some extra prep work before the weekend in order to fully take advantage of it.

So what are the things you can do to get ready for dataweekends?

  1. Brush up your Python: learn about Data structures, flow control, functions, classes, packages and "pythonic" constructs like list comprehension, iterators and generators. 3 resources to get started are:
    1. Anaconda Python:  an easy to install and manage python distribution. Make sure you install the Python 3.6 version.
    2. Learn Python the Hard Way:  a great way to start learning without putting too much effort into it
    3. Hacker Rank 30 days of code:  more problem-solving oriented
  2. Familiarize yourself with Jupyter Notebook. This is the environment we will use for most of the course. It's very easy to use, but can be disorienting at first, especially if you are used to work with an IDE or have never coded before. Here are a couple of resources to learn the basics:
    1. Getting Started
    2. Shortcuts & Tips

Besides these, you can also read about a few libraries we use in every class. It is not strictly necessary, because we gradually introduce each of them, but it can help to have some familiarity, especially if you don't plan to attend all weekends in a row, but only a few of the advanced ones. Remember, Dataweekends are 50% hand-on, with plenty of time to practice your skills.

  1. Pandas: the standard library to manage tabular data in python. We will use it a lot. Here's a 10 minutes introduction to it. -
  2. Matplotlib: the library for plotting in Python. We will use it to display data.

Finally, here are a couple of articles on machine learning that you may use to wet your appetite. We encourage our students to ask questions during class, so, if you read a little about things like: supervised & unsupervised learning - model validation - feature engineering and cost functions you may have a few questions that we'll be happy to address.

Here are a few links:

We are excited to get you started in your data science journey and look forward to seeing you in class!

Set-up your Mac for Deep Learning with Python, Tensorflow and Keras

This is the first of a 4 articles series on how to get you started with Deep Learning in Python.

In this guide I'll show you how to:

  • download and install Anaconda Python on your laptop
  • create a conda environment for your deep learning development
  • install the required packages in that environment

Download and install Anaconda Python

Anaconda is the leading open data science platform powered by Python. The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science.

It can be downloaded here. Python comes in two major versions: 2.7 and 3.x. Python 2.7 is considered legacy, while 3.x is the present and future of Python. Despite this, I often recommend to install Python 2.7, because of the larger library support. If you are a complete beginner, you may want to start directly with Python 3.x. A detailed explanation of the differences between 2.7 and 3.x can be found here, and here you can find a discussion about what version to choose.

In Dataweekends workshops we use Python 2.7, because that's what most of our users are already familiar with.

Once you've downloaded Anaconda, you should install it on your Mac following the instructions provided by the Graphical installer.

Here are a couple of screenshots of key steps:

Unless you have very specific reasons to do so, we recommend to install anaconda for the local user only.

Create a conda environment for deep learning

The guys at Continuum have developed an extremely versatile package manager called conda. It is a package manager that quickly installs, runs, and updates packages and their dependencies. It can also query and search the package index and current installation, create new environments, and install and update packages into existing conda environments.

Conda environments are coherent collections of packages with specific versions that can be used to ensure portability of your python code. For example, imagine you developed a short python program that uses version 1.2 of a certain package. Let's say you want to share your code with a friend, but are not sure if she has the same package on her laptop. You ask your friend, and she is currently using version 1.0 of that package, because she uses it as part of another project. So, your friend doesn't want to update the library to 1.2 and would also like to test your script, which requires the upgrade. An environment solves this problem by allowing your friend to have both versions of the library, 1.0 and 1.2, in two separate environments, so that they do not interfere with one another.

Let's create an environment for our data science development, we'll call this environment dataweekends, but you can call it with any name you want.

In a terminal window type:

conda create -n dataweekends python=2.7 pandas scikit-learn jupyter matplotlib

hit Enter and answer y when prompted to proceed. If all goes to plan at the end you should see this message:

# To activate this environment, use:
# > source activate dataweekends
#
# To deactivate this environment, use:
# > source deactivate dataweekends

You can then go ahead and activate the environment typing:

source activate dataweekends

This will prepend (dataweekends) to your terminal prompt. You can verify that you are in the correct active environment by typing:

which python

which should return:

/Users/<yourusername>/anaconda/envs/dataweekends/bin/python

Great! We have created an environment and successfully activated it. Now let's install keras.

Install Tensorflow and Keras

Tensorflow is an Open Source Software Library for Machine Intelligence originally developed by researchers and engineers working on the Google Brain Team. Version 1.0 has been announced in February, so that's the version we will install. There are several ways to install it, we'll use the pip method for this tutorial. In your active dataweekends environment terminal type:

pip install tensorflow

Keras is a high-level neural networks api specification, implemented in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation and it allows to go from idea to result with the least possible delay.

Although Keras is also provided by community channel of Anaconda packages (conda-forge), it's most recent version is best installed with pip, so we'll go ahead and use that version. In your active dataweekends environment terminal type:

pip install keras

At the time of writing this installs keras version 1.2.2. Also, at the Tensorflow Dev Summit it was announced that keras will become part of Tensorflow from version 1.1, so in the future it will be already installed whith Tensorflow.

Voilà! You are done installing Anaconda, Tensorflow and Keras.

To test your installation you can type:

ipython

which will start the ipython console:

Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:05:08) 
Type "copyright", "credits" or "license" for more information.

IPython 5.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details. details.

In [1]:

You can then type:

import keras, tensorflow

to which it should reply: Using TensorFlow backend.

Congratulations!!! You have successfully set up your Mac for development with Python, Keras and Tensorflow!

To stop the ipython console just type CTRL+D twice.

To exit the dataweekends environment, type:

source deactivate

Troubleshooting

Check the version of installed package

If you are not sure about the version of a Tensorflow (or any other package), it's easy to check that. In the ipython console (make sure you started it from within your environment)

import tensorflow
tensorflow.__version__

If you are not at the current version, you can always upgrade it using pip as explained earlier.

Switching Keras backend

Keras' backend is set in a hidden file stored in your home path. You can find it at $/.keras/keras.json. You can open it with a text editor and you should see something like this:

{
  "image_dim_ordering": "tf", 
  "epsilon": 1e-07, 
  "floatx": "float32", 
  "backend": "tensorflow"
}

Switching backend is as easy as replacing tensorflow with theano in the last key. Then save the file and close it. If you then open ipython and import keras you should see:

Using Theano backend.

You are now ready to step to the second part of this tutorial.

Learn data science in a weekend

Data Weekends™ stem from two beliefs:

  • in-person learning is more engaging than watching a video
  • a couple of intense days of active learning have a more profound and lasting impact than a sequence of short moments of equal total duration

To pick up a new skill we need to invest time and dedication. We need interaction with masters and mentors, a strong curiosity about the subject and long hours spent perfecting our ability.

But how can we know that we like something if we don't try it out first?

And how can we quickly get past the initial feeling of "being lost"?

How can we learn the vocabulary of the new field at the same time as we learn the new tools and techniques?

Is it possible to jumpstart the learning and get quickly to a state that's more fun and engaging than the ABC?

Data Weekends™ is born to answer these questions!

We crafted a two day learning experience that brings its participants from "oh, I heard Data Science is cool, I should learn about it some day" to "wow, I can do so many things with the new predictive modeling techniques I've just learnt!".

Participants receive a solid scaffold, a strong base on which to build their future knowledge of data science and they also get to taste what data science looks like through hands-on practice and exercises. This eases the beginning of the journey into this vast discipline, while also bringing more fun and enjoyment.

The weekend comprises a fifty-fifty ratio of theory and exercises, it includes group work, pair programming exercises and coding challenges. Participants are guided and assisted by experienced mentors with years of experience in the industry. The content is geared to the software engineer or programmer who is already past the barrier of learning to code and wants to add new predictive modeling tools to its toolkit.

Want to give it a try! Apply now!

Our first Machine Learning weekend

The first Data Weekend on Machine Learning in San Francisco was a great success!

Participants arrived in the morning, not sure of what to expect. After all they were the pioneers... The first curious ones wanting to spend a whole weekend cranking on Data Science.

We mingled over breakfast and participants started to get to know each other and interact, then at 9:30 we started to work.

During the first day participants discover the basic techniques of Machine Learning: regression, classification and clustering and they learn to apply them to data problems using Python. The balance of class to theory is about half and half, so that one builds knowledge and hands-on practice together. Real world datasets are used throughout the examples, starting from simple and then building up complexity.

We had breaks for coffee and lunch. It was very hard to break sessions, though. People seemed to be genuinely engaged and having fun learning, so much so that they were willing to skip coffee breaks and lunch !

On the second day the atmosphere was more relaxed and familiar. People were getting to know each other, and we facilitated that with a short fun activity in the morning. Then, more advanced techniques were introduced. Students learnt how to validate a model and how to judge the results of a prediction task. The whole afternoon of the second day was dedicated to the final project, which involved setting up a simple website to serve predictions based on a model.

When the second day was over, we parted with a smile. The experiment was a success. Data Weekend was born!

Read the comments of people who participated:

A very good introduction, which provides a map of the field, a theoretical explanation and useful tool to start experiments on your own.
— Raffaele
A crash course on data science with real exposure to coding and a valuable and interesting final project.
A very well structured flow of the course with a great final project!
— Dimitar
Best way to get started in ML coming from a techie position.
What stood out the most for me were the programming exercises! They were well-designed, 1-page data science Python programs that I could understand by reading quickly and then working on improving or extending in some well-defined way. I feel much more comfortable with Python’s data science libraries now!
A very good introduction, which provides a map of the field, a theoretical explanation and useful tool to start experiments on your own.
A crash course on data science with real exposure to coding and a valuable and interesting final project
Most productive weekend in my last few years :)

Would you like to participate to the next Data Weekend? Sign up now!