to be precise — recommender systems without feedback loops.

Photo by Javier Allegue Barros on Unsplash

Most people have come across recommender systems in their daily digital life, maybe in Netflix or Amazon (Shopping & Prime Video). These companies have popularized this concept that they know what we like and can recommend similar things or things similar people like you had liked. They are a great way of exploring and finding things that you want to see. But…

Imagine you have watched the Star Wars series, and at the end of it, it will start suggesting some more options similar to those movies. …


why I chose Python and Visual Studio Code for my data science projects…

Photo by Campaign Creators on Unsplash

I’ve been in the Data Science field for more than 6 years and have tried and tested different tools from programming in terminal to text editors and cloud platforms. Also, I’ve used both Python and R, but I now work in python only for the past few years.

In this article, I’ll write about

  • Why I choose Python over R
  • My preferred text editor
  • And other tools I use

Why I prefer Python to R?

Usability

Python is a general-purpose programming language and can be used for all purposes like — web scraping, automation, building websites, building APIs, and of course machine learning models. …


PySpark’s ML Lib has all the necessary algorithms for machine learning and multi-layer perceptron is nothing but a neural network that is part of Spark’s ML Lib.

A high-level diagram explaining input, hidden, and output layers in multi-layer perceptron.

Introduction

PySpark is a python wrapper to support Apache Spark. Apache Spark is a distributed or cluster computing framework for Big Data Analysis written in Scala.

Today we will look at how we can build a Multi-layer Perceptron Classifier (Neural Net) on the Iris dataset, including data preprocessing and evaluation.

Pre-requisites:

  1. You have PySpark available either on a local computer or using Google Colab or in Databricks.
  2. Iris dataset — I’ve downloaded from Kaggle https://www.kaggle.com/uciml/iris. It’s a UCI ML dataset that I’ll be using in this article.

If using a local computer, I use findspark package and establish a SparkSession, as below:


Pyenv and Pipenv are necessary tools if you are working on different projects that need to be deployed to production and maintain a clean codebase.

A high-level overview of how Pyenv and Pipenv are different and solve the bigger problem.

We often have a problem when working on different projects in the local system

  1. we might need different python versions for different projects (less common) or
  2. we might need python packages compatible with particular versions (more likely).
  3. Virtual environments for different projects for easy deployments

After stumbling with this problem, I found a perfect solution using two awesome libraries — Pyenv and Pipenv.

Pyenv is to manage Python versions, and Pipenv is to create virtual environments required for each project and manage python packages and their dependencies for each project.

This is a great way of working on different projects…

Avinash Kanumuru

Data Scientist | Machine Learning Engineer | Data & Analytics Manager

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store