Skip to main content

MLOps - Day 8

Today's learning :-

  • If a column/field help you to predict target then only it can be feature, else it won't be a feature or predictor.

  • In this example we can see that marks doesn't depend on Name and college ID. That means these can't be my feature.
  • Using correlation method we check if College_id is a feature for marks or not.
  • Feature selection using correlation is known as feature selection approach using filter.
  • But sometimes it doesn't give proper result hence we have another approach then we have to go for Embedded technique of feature selection.
  • Coefficient is one of way of feature selection using Embedded technique.
  • Feature selection using coefficient is a slow technique because first of all we have to create a model, then train this model, then find model coefficient. Which a long process (Create model > Train model > Find model coefficient = coefficient feature selection technique).
  • If you want to do feature selection using Embedded technique(because it is more accurate then Filter technique) but don't want to use use coefficient technique because it is slow process then we have one more technique "Lasso/ L1 Regularization"
  • Lasso is a faster and accurate technique of feature selection. 
Code:-
from sklearn.linear_model import Lasso
from sklearn.feature_selection import SelectFromModel
sel = SelectFromModel(Lasso())
sel.fit(X, y)
sel.get_support()
  • If we do experiments(hit & try) on data and try to find out a formula/output, this kind of research known as Data Science and the person who do this process known as data scientist.
  • If we change string to a number(One -> 1), this process is known as encoding/transformation and this is one of example of Feature Engineering.
  • If you want your model support your data you have to do Feature Engineering.
  • OneHot encoding technique can be used to transform your categorical Data data into a new variable.
Code :- To predict startup company profit.
import pandas as pd
dataset = pd.read_csv('50_Startups.csv')
dataset.columns
X = dataset[['R&D Spend', 'Administration', 'Marketing Spend', 'State']]
y = dataset['Profit']
state = X['State']
state_dummy = pd.get_dummies(state)
X_new = X.iloc[:, 0:3]
f_state = state_dummy.iloc[:, :2]
X_new[['California', 'Florida']] = f_state
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size=0.20, random_state=42)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Comments

Popular posts from this blog

error: db5 error(11) from dbenv->open: Resource temporarily unavailable

If rpm command is not working in your system and it is giving an error message( error: db5 error(11) from dbenv->open: Resource temporarily unavailable ). What is the root cause of this issue? How to fix this issue?   just a single command- [root@localhost rpm]# rpm --rebuilddb Detailed error message- [root@localhost rpm]# rpm -q firefox ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages index using db5 - Resource temporarily unavailable (11) error: cannot open Packages database in /var/lib/rpm ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages database in /var/lib/rpm package firefox is not installed [root@localhost rpm]# RPM manage a database in which it store all information related to packages installed in our system. /var/lib/rpm, this is directory where this information is available. [root@localhost rpm]# cd /var/lib/rpm ...

Failed to get D-Bus connection: Operation not permitted

" Failed to get D-Bus connection: Operation not permitted " - systemctl command is not working in Docker container. If systemctl command is not working in your container and giving subjected error message then simple solution of this error is, create container with -- privileged option and also provide init file full path  /usr/sbin/init [root@server109 ~]# docker container run -dit --privileged --name systemctl_not_working_centos1 centos:7 /usr/sbin/init For detailed explanation and understanding I am writing more about it, please have look below. If we have a daemon based program(httpd, sshd, jenkins, docker etc.) running inside a container and we would like to start/stop or check status of daemon inside docker then it becomes difficult for us to perform such operations , because by default systemctl and service  commands don't work inside docker. Normally we run below commands to check services status in Linux systems. [root@server109 ~]# systemctl status ...

How to check rpm integrity?

This post will help you to get answers of below questions- How to check rpm integrity? How to check rpm authenticity? How to check rpm digital signature? What is gpgcheck? Let's take an example of below rpm package and see, how to verify if it is a genuine package? [root@localhost tmp]# ls -l vsftpd-2.2.2-11.el6.x86_64.rpm -r--r--r--. 1 root root 154392 Jan 27 10:27 vsftpd-2.2.2-11.el6.x86_64.rpm [root@localhost tmp]# There are multiple way to verify. 1. Verify using rpm [root@localhost tmp]# rpm -q vsftpd package vsftpd is not installed [root@localhost tmp]# [root@localhost tmp]# rpm -K vsftpd-2.2.2-11.el6.x86_64.rpm vsftpd-2.2.2-11.el6.x86_64.rpm: RSA sha1 ((MD5) PGP) md5 NOT OK (MISSING KEYS: (MD5) PGP#fd431d51) [root@localhost tmp]# If you want to see more details then use below options [root@localhost tmp]# rpm -vvK vsftpd-2.2.2-11.el6.x86_64.rpm D: loading keyring from pubkeys in /var/lib/rpm/pubkeys/*.key D: couldn't find any keys in /var/...