Skip to main content

MLOps - Day 9

Today's learning :-
    • Started with a quick revision of things studied in last few days.
    • Dimensionality Reduction - Feature Elimination
    • Feature Selection(FS) - Filter(Correlation), Embedded(Lasso/L1 Regression), Wrapper(OLS - Ordinary Least Square)
    • Filter method is faster but less accurate.
    • Embedded method(Coefficient) is accurate but a slower approach.
    • Wrapper method help us to achieve both accuracy, speed and competitively high performance. That is only the reason we use wrapper method a lot in feature selection. In Deep Learning(DL) and Neural Network(NN) also we use wrapper method in background.
    • Feature Extraction(FE) - For this we use PCA(Principle Component Analysis). Why we do feature extraction? - One of the reason of it is performance, by doing this we can increase performance.
    Feature selection using Wrapper method :-
    • If any of the variable you want to use as feature(X) and if it is a categorical variable then first of all you have convert(encoding) into dummy variable. For this we can use One-Hot.
    Code :-
    import pandas as pd
    dataset = pd.read_csv('50_Startups.csv')
    dataset.head()

    dataset.columns
    y = dataset['Profit']
    X = dataset[['R&D Spend', 'Administration', 'Marketing Spend', 'State']]

    y.head()
    X.head()
    #Encoded 'State' feature into dummy variable
    X = pd.get_dummies(X, drop_first=True

    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20 , random_state=42)
    from sklearn.linear_model import LinearRegression
    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_pred
    Output:- 
    array([126362.87908255,  84608.45383634,  99677.49425147,  46357.46068582,
           128750.48288504,  50912.4174188 , 109741.35032702, 100643.24281647,
            97599.27574594, 113097.42524432])
     
    y_test.head()
    Output:-
    13    134307.35
    39     81005.76
    30     99937.59
    45     64926.08
    17    125370.37
     
    model.coef_
    array([ 8.05630064e-01, -6.87878823e-02,  2.98554429e-02,  9.38793006e+02,
            6.98775997e+00])
    • In this example Coefficient help to know, where we should optimize/spent/reduce investment so that we can get highest profit.
    • Now we will do all the below steps to improve performance using dimensionality reduction using Wrapper feature selection method.
    • So let's find out which feature is not important or has less importance and then remove it.
    • For this we have genenral standard that if the value of P>|t| is greater than 0.05(Significant Level) we can remove that feature.
    Code:-
    import statsmodels.api as sm
    model_ols = sm.OLS(endog=y, exog=X).fit()
    model_ols.summary()
    • But here we can see that there are only 5 feature that means we don't have Bias(b, y = b + c1x1 + c2x2 + c3x3 ), that is not possible. If we don't have bias then we can't predict accurately.
    • OLS doesn't take bias automatically hence we have to add a constant.
    Code:-
    import numpy as np 
    ones = np.ones( (50, 1))
    np.append(arr=ones, values=X, axis=1) 
    model_ols = sm.OLS(endog=y, exog=X_new).fit()
    model_ols.summary()
    • Now in this time summery we can see that last feature x5 the value of P>|t| is greater than 0.05 (SL) hence we can remove it.
    Code:-
    X_new = X_new[:, 0:5]
    model_ols = sm.OLS(endog=y, exog=X_new).fit()
    model_ols.summary()

    • Same we have follow for all the features which have P>|t| is greater than 0.05
    • When eliminate(remove) any feature your Adj. R-Squared value should increase, but if it decrease that means you should not remove that feature.

    Comments

    Popular posts from this blog

    error: db5 error(11) from dbenv->open: Resource temporarily unavailable

    If rpm command is not working in your system and it is giving an error message( error: db5 error(11) from dbenv->open: Resource temporarily unavailable ). What is the root cause of this issue? How to fix this issue?   just a single command- [root@localhost rpm]# rpm --rebuilddb Detailed error message- [root@localhost rpm]# rpm -q firefox ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages index using db5 - Resource temporarily unavailable (11) error: cannot open Packages database in /var/lib/rpm ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages database in /var/lib/rpm package firefox is not installed [root@localhost rpm]# RPM manage a database in which it store all information related to packages installed in our system. /var/lib/rpm, this is directory where this information is available. [root@localhost rpm]# cd /var/lib/rpm ...

    Failed to get D-Bus connection: Operation not permitted

    " Failed to get D-Bus connection: Operation not permitted " - systemctl command is not working in Docker container. If systemctl command is not working in your container and giving subjected error message then simple solution of this error is, create container with -- privileged option and also provide init file full path  /usr/sbin/init [root@server109 ~]# docker container run -dit --privileged --name systemctl_not_working_centos1 centos:7 /usr/sbin/init For detailed explanation and understanding I am writing more about it, please have look below. If we have a daemon based program(httpd, sshd, jenkins, docker etc.) running inside a container and we would like to start/stop or check status of daemon inside docker then it becomes difficult for us to perform such operations , because by default systemctl and service  commands don't work inside docker. Normally we run below commands to check services status in Linux systems. [root@server109 ~]# systemctl status ...

    How to check rpm integrity?

    This post will help you to get answers of below questions- How to check rpm integrity? How to check rpm authenticity? How to check rpm digital signature? What is gpgcheck? Let's take an example of below rpm package and see, how to verify if it is a genuine package? [root@localhost tmp]# ls -l vsftpd-2.2.2-11.el6.x86_64.rpm -r--r--r--. 1 root root 154392 Jan 27 10:27 vsftpd-2.2.2-11.el6.x86_64.rpm [root@localhost tmp]# There are multiple way to verify. 1. Verify using rpm [root@localhost tmp]# rpm -q vsftpd package vsftpd is not installed [root@localhost tmp]# [root@localhost tmp]# rpm -K vsftpd-2.2.2-11.el6.x86_64.rpm vsftpd-2.2.2-11.el6.x86_64.rpm: RSA sha1 ((MD5) PGP) md5 NOT OK (MISSING KEYS: (MD5) PGP#fd431d51) [root@localhost tmp]# If you want to see more details then use below options [root@localhost tmp]# rpm -vvK vsftpd-2.2.2-11.el6.x86_64.rpm D: loading keyring from pubkeys in /var/lib/rpm/pubkeys/*.key D: couldn't find any keys in /var/...