Skip to main content

MLOps - Day 8

Today's learning :-

  • If a column/field help you to predict target then only it can be feature, else it won't be a feature or predictor.

  • In this example we can see that marks doesn't depend on Name and college ID. That means these can't be my feature.
  • Using correlation method we check if College_id is a feature for marks or not.
  • Feature selection using correlation is known as feature selection approach using filter.
  • But sometimes it doesn't give proper result hence we have another approach then we have to go for Embedded technique of feature selection.
  • Coefficient is one of way of feature selection using Embedded technique.
  • Feature selection using coefficient is a slow technique because first of all we have to create a model, then train this model, then find model coefficient. Which a long process (Create model > Train model > Find model coefficient = coefficient feature selection technique).
  • If you want to do feature selection using Embedded technique(because it is more accurate then Filter technique) but don't want to use use coefficient technique because it is slow process then we have one more technique "Lasso/ L1 Regularization"
  • Lasso is a faster and accurate technique of feature selection. 
Code:-
from sklearn.linear_model import Lasso
from sklearn.feature_selection import SelectFromModel
sel = SelectFromModel(Lasso())
sel.fit(X, y)
sel.get_support()
  • If we do experiments(hit & try) on data and try to find out a formula/output, this kind of research known as Data Science and the person who do this process known as data scientist.
  • If we change string to a number(One -> 1), this process is known as encoding/transformation and this is one of example of Feature Engineering.
  • If you want your model support your data you have to do Feature Engineering.
  • OneHot encoding technique can be used to transform your categorical Data data into a new variable.
Code :- To predict startup company profit.
import pandas as pd
dataset = pd.read_csv('50_Startups.csv')
dataset.columns
X = dataset[['R&D Spend', 'Administration', 'Marketing Spend', 'State']]
y = dataset['Profit']
state = X['State']
state_dummy = pd.get_dummies(state)
X_new = X.iloc[:, 0:3]
f_state = state_dummy.iloc[:, :2]
X_new[['California', 'Florida']] = f_state
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size=0.20, random_state=42)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Comments

Popular posts from this blog

error: db5 error(11) from dbenv->open: Resource temporarily unavailable

If rpm command is not working in your system and it is giving an error message( error: db5 error(11) from dbenv->open: Resource temporarily unavailable ). What is the root cause of this issue? How to fix this issue?   just a single command- [root@localhost rpm]# rpm --rebuilddb Detailed error message- [root@localhost rpm]# rpm -q firefox ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages index using db5 - Resource temporarily unavailable (11) error: cannot open Packages database in /var/lib/rpm ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages database in /var/lib/rpm package firefox is not installed [root@localhost rpm]# RPM manage a database in which it store all information related to packages installed in our system. /var/lib/rpm, this is directory where this information is available. [root@localhost rpm]# cd /var/lib/rpm ...

Failed to get D-Bus connection: Operation not permitted

" Failed to get D-Bus connection: Operation not permitted " - systemctl command is not working in Docker container. If systemctl command is not working in your container and giving subjected error message then simple solution of this error is, create container with -- privileged option and also provide init file full path  /usr/sbin/init [root@server109 ~]# docker container run -dit --privileged --name systemctl_not_working_centos1 centos:7 /usr/sbin/init For detailed explanation and understanding I am writing more about it, please have look below. If we have a daemon based program(httpd, sshd, jenkins, docker etc.) running inside a container and we would like to start/stop or check status of daemon inside docker then it becomes difficult for us to perform such operations , because by default systemctl and service  commands don't work inside docker. Normally we run below commands to check services status in Linux systems. [root@server109 ~]# systemctl status ...

call to function "map" failed: the "map" function was deprecated in Terrafrom

How to change map method to tomap method? Let's say you have multiple tags in your code which was written quite back and that time it was working fine on old Terraform version before v0.12 but if the same code you execute on updated/latest Terrafrom you get subjected error while try to run Terrafrom plan command. Then this article will help you to fix your issue. What is simple solution to fix this issue? Just replace " map " method to " tomap " and just to little bit formatting for the same. Syntax:- map ({"Name", "My_Name"), map("AppName", "My_App")}) tomap ({"Name"  =   "My_Name",  "App_Name"  =   "My_App"}) or tomap ({     "Name"  =   "My_Name",     "App_Name"  =   "My_App" }) #Code with " map " method resource "aws_instance" "My_instance"   ami   =   my_ami   instance_type =   my_type   tags  =   merge(var.tag...