Skip to main content

MLOps - Day 8

Today's learning :-

  • If a column/field help you to predict target then only it can be feature, else it won't be a feature or predictor.

  • In this example we can see that marks doesn't depend on Name and college ID. That means these can't be my feature.
  • Using correlation method we check if College_id is a feature for marks or not.
  • Feature selection using correlation is known as feature selection approach using filter.
  • But sometimes it doesn't give proper result hence we have another approach then we have to go for Embedded technique of feature selection.
  • Coefficient is one of way of feature selection using Embedded technique.
  • Feature selection using coefficient is a slow technique because first of all we have to create a model, then train this model, then find model coefficient. Which a long process (Create model > Train model > Find model coefficient = coefficient feature selection technique).
  • If you want to do feature selection using Embedded technique(because it is more accurate then Filter technique) but don't want to use use coefficient technique because it is slow process then we have one more technique "Lasso/ L1 Regularization"
  • Lasso is a faster and accurate technique of feature selection. 
Code:-
from sklearn.linear_model import Lasso
from sklearn.feature_selection import SelectFromModel
sel = SelectFromModel(Lasso())
sel.fit(X, y)
sel.get_support()
  • If we do experiments(hit & try) on data and try to find out a formula/output, this kind of research known as Data Science and the person who do this process known as data scientist.
  • If we change string to a number(One -> 1), this process is known as encoding/transformation and this is one of example of Feature Engineering.
  • If you want your model support your data you have to do Feature Engineering.
  • OneHot encoding technique can be used to transform your categorical Data data into a new variable.
Code :- To predict startup company profit.
import pandas as pd
dataset = pd.read_csv('50_Startups.csv')
dataset.columns
X = dataset[['R&D Spend', 'Administration', 'Marketing Spend', 'State']]
y = dataset['Profit']
state = X['State']
state_dummy = pd.get_dummies(state)
X_new = X.iloc[:, 0:3]
f_state = state_dummy.iloc[:, :2]
X_new[['California', 'Florida']] = f_state
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size=0.20, random_state=42)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Comments

Popular posts from this blog

error: db5 error(11) from dbenv->open: Resource temporarily unavailable

If rpm command is not working in your system and it is giving an error message( error: db5 error(11) from dbenv->open: Resource temporarily unavailable ). What is the root cause of this issue? How to fix this issue?   just a single command- [root@localhost rpm]# rpm --rebuilddb Detailed error message- [root@localhost rpm]# rpm -q firefox ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages index using db5 - Resource temporarily unavailable (11) error: cannot open Packages database in /var/lib/rpm ^Cerror: db5 error(11) from dbenv->open: Resource temporarily unavailable error: cannot open Packages database in /var/lib/rpm package firefox is not installed [root@localhost rpm]# RPM manage a database in which it store all information related to packages installed in our system. /var/lib/rpm, this is directory where this information is available. [root@localhost rpm]# cd /var/lib/rpm [root@

Failed to get D-Bus connection: Operation not permitted

" Failed to get D-Bus connection: Operation not permitted " - systemctl command is not working in Docker container. If systemctl command is not working in your container and giving subjected error message then simple solution of this error is, create container with -- privileged option and also provide init file full path  /usr/sbin/init [root@server109 ~]# docker container run -dit --privileged --name systemctl_not_working_centos1 centos:7 /usr/sbin/init For detailed explanation and understanding I am writing more about it, please have look below. If we have a daemon based program(httpd, sshd, jenkins, docker etc.) running inside a container and we would like to start/stop or check status of daemon inside docker then it becomes difficult for us to perform such operations , because by default systemctl and service  commands don't work inside docker. Normally we run below commands to check services status in Linux systems. [root@server109 ~]# systemctl status

AWS cloud automation using Terraform

In this post I'll create multiple resources in AWS cloud using Terraform . Terraform is an infrastructure as code( IAC ) software which can do lots of things but it is superb in cloud automation. To use Terraform we have write code in a high-level configuration language known as Hashicorp Configuration Language , optionally we can write code in JSON as well. I'll create below service using Terraform- 1. Create the key-pair and security group which allow inbound traffic on port 80 and 22 2. Launch EC2 instance. 3. To create EC2 instance use same key and security group which created in step 1 4. Launch Volume(EBS) and mount this volume into /var/www/html directory 5. Upload index.php file and an image on GitHub repository 6. Clone GitHub repository into /var/www/html 7. Create S3 bucket, copy images from GitHub repo into it and set permission to public readable 8 Create a CloudFront use S3 bucket(which contains images) and use the CloudFront URL to update code in /var/w