Randomized Search for Classification and Regression Models

Image created by author: TechFitLab
  • What is Hyperparameter Tuning?
  • What is Randomized Search?
  • Implementing Randomized Search for Regression?
  • Implementing Randomized Search for Classification?

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned. — Wikipedia

In other words, hyperparameters are points of choice or configuration that allow a machine learning model to be customised for a specific task or dataset.

Randomized Search is a method in which random combinations of hyperparameters…

Identify, visualise and fill outliers in Python

Photo by Daniel Honies on Unsplash
  • How to identify Outliers in the Dataframe?
  • How to visualise Outliers in the Dataframe?
  • How to fill Outliers in Python?

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses. — Wikipedia

In other words the data that contains outliers are defined as observations that are far from the others. We should craft our assumptions about what is a “normal” expected value…

Standardisation, Normalisation and Binning in Python

Photo by Riccardo Chiarini on Unsplash
  • What is Feature Scaling?
  • Which Algorithms require Feature Scaling?
  • Standardisation, Normalisation and Binning

Feature scaling is a method used to normalise the range of independent variables or features of data within a particular range. In some scenarios, it also helps in speeding up the calculations in an algorithm. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.

In general, algorithms that exploit distances or similarities between data samples, such as k-NN, K-Means, PCA and SVM, are sensitive…

Step by Step Breakdown

Image created by author: TechFitLab

List Comprehension is a syntactic construct available in Python for creating a list based on existing lists. List Comprehensions are relatively faster than for loops because it is optimised for the Python interpreter to spot a predictable pattern during looping.

Use Case: In case you’re using a for loop along with .append() to create a list, List Comprehension is a good alternative.

In the below example, we are creating a list (enclosed in [] brackets) based on a for loop in range(0,10).

Elon Musk used to own plenty of Real Estate in California but in 2020 he went on a selling spree and started selling all his properties 1 by 1.

Working on sustainable energy for Earth with Tesla & protecting the future of consciousness by making life multiplanetary with SpaceX. Also, AI risk mitigation with Neuralink & fixing traffic with Boring. — Elon Musk

He took the task on himself a few weeks later, listing a total of seven properties scattered across the state for a combined $137 million. …

Apache Airflow is an open-source workflow management platform written in Python

Airflow helps you to create workflows using Python programming language and these workflows can be scheduled and monitored easily according to the frequency mentioned.

  1. Install Apache Airflow
  2. Create DAG (Directed Acyclic Graph)
  3. Import DAG in Airflow Database
  4. Turn on Webserver and Scheduler

In Airflow, a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run, organised in a way that reflects their relationships and dependencies.

For example, a simple DAG could consist of three tasks: A, B, and C. It could say that A has to run successfully before B…

Cron Jobs

Image Usage Licence through

The cron daemon is a long-running process that executes commands at specific dates and times. You can use this to schedule activities, either as one-time events or as recurring tasks.

If you are on a Mac (or Linux), you can use crontab, which is a scheduling tool that will run jobs (scripts) at regular intervals.

On Microsoft Windows, cron jobs are known as Scheduled Tasks. They can be added through the Windows Task Scheduler user interface, by using PowerShell or with help of schtasks.exe.

Crontab stands for “cron table” because it uses the job scheduler cron to execute tasks; cron

Image Usage Licence through

In plain English, Decorators allow you to ‘decorate’ your function or class in Python by adding new functionality. In other words, it allows you to extend the functionality of the original function without permanently modifying that function.

Let’s take an example of this dataframe:

Image Usage Licence through

Why Linux?

Linux is an open-source operating system that allows a user to control every aspect of the operating systems. Unlike Windows, you don’t need to reboot a Linux server after every update or patch. Due to this, Linux has the highest number of servers running on the Internet. Here are five of the highest-profile users of the Linux desktop worldwide.

  • Google
  • CERN
  • NASA
  • French Gendarmerie
  • US Department of Defense

Linux, DevOps, cloud and security are the top skill sets wanted from potential employees. Among hiring managers, 74% say that Linux is the most in-demand skill they’re seeking in new hires.


How to execute SQL statements in R?

Image Usage Licence through

In this tutorial, we will be using ROracle which is a R module that enables access to Oracle Database. We will also need to install a DBI module in R which defines an interface for communication between R and relational database management systems. I have updated the installation notes in the bottom of this post to get you setup for using SQL in R.

First, we will initialise the Oracle Driver and create connection variables such as hostname, port and sid. Then we will create our connect.string by passing all the variables…


Data Scientist with 9+ years of experience; YouTube Content Creator; I am qualified Gym Instructor and Personal Trainer. 👨🏻‍💻 🎥 🏋🏻

