• Styling plots with Seaborn

    So you have spent hours crunching numbers, figuring out how to use numpy and pandas, and you are finally ready for the fun stuff: plotting! After fighting with matplotlib for some time, there it is, you got it. Your first plot.

  • Panda's query method

    Pandas is great when we need to select or filter our data according to some criteria. Generally, no loops are needed. A clear statement of what we want is just enough.

  • Visualizing Outliers with d3.js

    I really like data visualization. I think many concepts could be easily explained with the right kind of visuals. That is why I was happy to attend the data visualization talk in the last PyData Berlin. Oddly enough, the visualizations presented were not created with matplotlib, bokeh, or any other of the known tools known to the Python community. They used d3.js.

  • Visualizing the bivariate normal distribution and its properties

    The normal distribution plays a central role in statistics and natural sciences. There are multiple natural phenomena (e.g. population height) that are closely described by a normal distribution. The picture that comes to mind when talking about the normal distribution is that of a bell shaped curved. Once we consider higher dimension random variables it s more difficult, or even impossible, drawing pictures to help us grasp the variables’ distribution features.

  • What's the difference between loc, iloc, and ix?

    Part of the regular trade of conducting data analysis is slicing. Slicing means taking a part of your data set for further inspection. pandas offers at least three methods for slicing data: .loc[], .iloc[], and .ix[]. It is really easy to take one for the other. Here is a quick reference to help you tell them apart.