Skip to main content

MLOps - Day 9

Today's learning :-
    • Started with a quick revision of things studied in last few days.
    • Dimensionality Reduction - Feature Elimination
    • Feature Selection(FS) - Filter(Correlation), Embedded(Lasso/L1 Regression), Wrapper(OLS - Ordinary Least Square)
    • Filter method is faster but less accurate.
    • Embedded method(Coefficient) is accurate but a slower approach.
    • Wrapper method help us to achieve both accuracy, speed and competitively high performance. That is only the reason we use wrapper method a lot in feature selection. In Deep Learning(DL) and Neural Network(NN) also we use wrapper method in background.
    • Feature Extraction(FE) - For this we use PCA(Principle Component Analysis). Why we do feature extraction? - One of the reason of it is performance, by doing this we can increase performance.
    Feature selection using Wrapper method :-
    • If any of the variable you want to use as feature(X) and if it is a categorical variable then first of all you have convert(encoding) into dummy variable. For this we can use One-Hot.
    Code :-
    import pandas as pd
    dataset = pd.read_csv('50_Startups.csv')
    dataset.head()

    dataset.columns
    y = dataset['Profit']
    X = dataset[['R&D Spend', 'Administration', 'Marketing Spend', 'State']]

    y.head()
    X.head()
    #Encoded 'State' feature into dummy variable
    X = pd.get_dummies(X, drop_first=True

    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20 , random_state=42)
    from sklearn.linear_model import LinearRegression
    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_pred
    Output:- 
    array([126362.87908255,  84608.45383634,  99677.49425147,  46357.46068582,
           128750.48288504,  50912.4174188 , 109741.35032702, 100643.24281647,
            97599.27574594, 113097.42524432])
     
    y_test.head()
    Output:-
    13    134307.35
    39     81005.76
    30     99937.59
    45     64926.08
    17    125370.37
     
    model.coef_
    array([ 8.05630064e-01, -6.87878823e-02,  2.98554429e-02,  9.38793006e+02,
            6.98775997e+00])
    • In this example Coefficient help to know, where we should optimize/spent/reduce investment so that we can get highest profit.
    • Now we will do all the below steps to improve performance using dimensionality reduction using Wrapper feature selection method.
    • So let's find out which feature is not important or has less importance and then remove it.
    • For this we have genenral standard that if the value of P>|t| is greater than 0.05(Significant Level) we can remove that feature.
    Code:-
    import statsmodels.api as sm
    model_ols = sm.OLS(endog=y, exog=X).fit()
    model_ols.summary()
    • But here we can see that there are only 5 feature that means we don't have Bias(b, y = b + c1x1 + c2x2 + c3x3 ), that is not possible. If we don't have bias then we can't predict accurately.
    • OLS doesn't take bias automatically hence we have to add a constant.
    Code:-
    import numpy as np 
    ones = np.ones( (50, 1))
    np.append(arr=ones, values=X, axis=1) 
    model_ols = sm.OLS(endog=y, exog=X_new).fit()
    model_ols.summary()
    • Now in this time summery we can see that last feature x5 the value of P>|t| is greater than 0.05 (SL) hence we can remove it.
    Code:-
    X_new = X_new[:, 0:5]
    model_ols = sm.OLS(endog=y, exog=X_new).fit()
    model_ols.summary()

    • Same we have follow for all the features which have P>|t| is greater than 0.05
    • When eliminate(remove) any feature your Adj. R-Squared value should increase, but if it decrease that means you should not remove that feature.

    Comments

    Popular posts from this blog

    error: db5 error(11) from dbenv->open: Resource temporarily unavailable

    If rpm command is not working in your system and it is giving an error message( error: db5 error(11) from dbenv->open: Resource temporarily unavailable ). What is the root cause of this issue? How to fix this issue?   just a single command- [root@localhost rpm]# rpm --rebuilddb Detailed error message- [root@localhost rpm]# rpm -q firefox ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages index using db5 - Resource temporarily unavailable (11) error: cannot open Packages database in /var/lib/rpm ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages database in /var/lib/rpm package firefox is not installed [root@localhost rpm]# RPM manage a database in which it store all information related to packages installed in our system. /var/lib/rpm, this is directory where this information is available. [root@localhost rpm]# cd /var/lib/rpm ...

    Failed to get D-Bus connection: Operation not permitted

    " Failed to get D-Bus connection: Operation not permitted " - systemctl command is not working in Docker container. If systemctl command is not working in your container and giving subjected error message then simple solution of this error is, create container with -- privileged option and also provide init file full path  /usr/sbin/init [root@server109 ~]# docker container run -dit --privileged --name systemctl_not_working_centos1 centos:7 /usr/sbin/init For detailed explanation and understanding I am writing more about it, please have look below. If we have a daemon based program(httpd, sshd, jenkins, docker etc.) running inside a container and we would like to start/stop or check status of daemon inside docker then it becomes difficult for us to perform such operations , because by default systemctl and service  commands don't work inside docker. Normally we run below commands to check services status in Linux systems. [root@server109 ~]# systemctl status ...

    AWS cloud automation using Terraform

    In this post I'll create multiple resources in AWS cloud using Terraform . Terraform is an infrastructure as code( IAC ) software which can do lots of things but it is superb in cloud automation. To use Terraform we have write code in a high-level configuration language known as Hashicorp Configuration Language , optionally we can write code in JSON as well. I'll create below service using Terraform- 1. Create the key-pair and security group which allow inbound traffic on port 80 and 22 2. Launch EC2 instance. 3. To create EC2 instance use same key and security group which created in step 1 4. Launch Volume(EBS) and mount this volume into /var/www/html directory 5. Upload index.php file and an image on GitHub repository 6. Clone GitHub repository into /var/www/html 7. Create S3 bucket, copy images from GitHub repo into it and set permission to public readable 8 Create a CloudFront use S3 bucket(which contains images) and use the CloudFront URL to update code in /var/w...