Cocky, overconfident and stupid

Photo by Harrison Kugler on Unsplash

In 2017, I invested everything I had into Crypto. I didn’t have much. But I was all in. As you can probably guess from the title, things didn’t go so well.

For the first couple of months I was ecstatic. Prices kept going up. I was making more money in 24 hours that I had ever made in a regular job.

I began watching crypto youtubers “explain” how various coins worked. I bought into the hype completely.

I told my friends to buy Crypto. “You don’t want to tell your grandkids you missed out, do you?”

I’d go onto coinmarketcap…


A walkthrough of Data Transformations in PySpark

Image by Markus Spiske from Pexels

Data is now growing faster than processing speeds. One of the many solutions to this problem is to parallelise our computing on large clusters. Enter PySpark.

However, PySpark requires you to think about data differently.

Instead of looking at a dataset row-wise. PySpark encourages you to look at it column-wise. This was a difficult transition for me at first. I’ll tell you the main tricks I learned so you don’t have to waste your time searching for the answers.

Dataset

I’ll be using the Hazardous Air Pollutants dataset from Kaggle.

This Dataset is 8,097,069 rows.

df = spark.read.csv(‘epa_hap_daily_summary.csv’,inferSchema=True, header =True)
df.show()


Analysis of the mathematical loss functions behind CycleGAN

Image by Jun-Yan Zhu on Github

CycleGAN is a method of unpaired image to image translation. Unfortunately, it’s possible to use CycleGAN without fully understanding or appreciating the mathematics involved. That is a real shame.

In this article, I’ll walk through the mathematics behind Cycle-Consistent Adversarial Networks. Please read the paper for a more comprehensive explanation.

Unpaired vs Paired

The key thing with CycleGAN is that we don’t have before and after images.

Let’s take the example shown above of converting a zebra into a horse (and vice-versa).

In a paired dataset the horse and zebra need to “match” each other. We’re essentially taking a horse and painting it…


A Useful Tip Learned from Productionizing a Classification Model

Photo by Wojciech Then on Unsplash

In production the stakes are high. People are going to be reading the outputs from the model. And the outputs better make sense.

Recently my team and I created a NLP classifier and put it into production on a large insurance dataset. It uses TfidfVectorizer and LinearSVC to classify free-text.

But I quickly realised just that putting something into production is so different to the theory.


Applying Neural Networks to the Meal Kit Industry

Photo by Lily Banse on Unsplash

So this is going to overfit.

Time series problems usually struggle with overfitting. This entire exercise became more of a challenge to see how I could prevent overfitting in time series forecasting.

I added weight decay and dropout. This should work to prevent overfitting. The network has embedding layers for categorical variables (which I vary in size) followed by dropout and batch normalisation (for continuous variables).

According to this article ideally, you want lower amounts of dropout and larger amounts of weight decay.

Dataset

The data is given by a meal kit company. …


8 Data Science Algorithms Explained Visually

Photo by Christina @ wocintechchat.com on Unsplash

Interviewer: “So how does Random Forests work?”

Me: “Umm…well… It’s kind of like a decision tree…and…um”

Interviewer: “How does Gradient Descent work?”

Me: “So…if you look at the equation…um…it’s kind of like…umm”

This was me during a real data science interview. As you can probably imagine I didn’t get the job.

How could I fail so badly? I knew the maths. I understood the material. I could code this up in python.

The problem was: I couldn’t communicate my understanding.

My mathematics courses and programming courses taught me how to code and how to think about data modelling. …


Photo by Jess Bailey on Unsplash

Simple Hacks to reduce the Gas you pay for Smart Contracts

I went from paying excessive amounts in gas to paying a reasonable amount after doing a course on solidity development.

I’ll tell you the main tricks here so you don’t waste your time doing the entire course.

1. Smaller uints

If you’ve got multiple uints inside a struct use a small-sized uint. This allows Solidity to use less storage.

Convert this:


struct NormalStruct {
uint a;
uint b;
uint c;
}

To this:

struct MiniMe {
uint32 a;
uint32 b;
uint c;
}

MiniMe will cost less gas than `NormalStruct` because of struct packing

2. View Functions Don’t Cost You a Thing

View functions don’t cost any gas when they’re called…


Ethereum One Way Hashing functions explained

Image by author. Quote from here.

Can you represent 8,018,009 as a product of two prime numbers?

You can use whatever calculator or program you like. I’ll wait.

****

I couldn’t do it either. Nor can a computer.

This is the guiding principle behind Crypto Maths.

In this post we’ll go from a private key to an address using all the mathematical functions in between. Much of this comes from Chapter 4 of the ethereumbook.

Public Keys

You’ve heard many definitions of a Public Key. But here’s the real one:

“An Ethereum public key is a point on an elliptic curve, meaning it is a set of x…


Paired Dataset for Image to Image Translation in Old Films

TLDR

I’ve created a dataset for training film restoration models.

The video above shows a sample. On the left is a video of a great star wars scene. On the right is the same video made crappier.

The extracted frames are available here: https://www.kaggle.com/spiyer/old-film-restoration-dataset/. You could use this to train a film restoration model (like I’ve been doing). Enjoy!

Why did I do this?

Properly cleaned data is not as abundant as people make it out to be.

I’ve been trying to restore the star wars deleted scenes for some time now. My attempts have been far from perfect.

Recently I thought that if I…


Use Icevision and Detectron2 to detect swimming pools from aerial imagery

Photo by CHUTTERSNAP on Unsplash

Talk is cheap. Show me the code. — Linus Torvalds

There’s a lot of talk about swimming pool detection from aerial imagery.

You’re probably interested in a code first example. I was too. But I couldn’t find one.

I decided to make my own.

It’s not perfect. It’s not pretty. But it seems to work.

All code is on Github. Criticism is appreciated.

Dataset

To make this you’ll need data. Lots of labelled training data. This can be tricky to obtain. Particularly when your budget is as low as mine ($0).

I managed to find a government resource that gives you…

Neel Iyer

Data Scientist at Swiss Reinsurance. Linkedin: https://www.linkedin.com/in/neel-iyer/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store