MLOps - Day 15

Today's learning :-
Data Visualization:-

I'll continue to learning data visualization, As we know there are lots of libraries we have in python for data visualization (matplotlib, seabon, folium etc.). Today I'll learn one more library called 'seaborn'.
Seaborn, a library of python which help us to achieve statistical graphs, but in background it use matplotlib.
We have do two types of operations on data, 1. Analysis - if we do data operation on past data it is known as data analysis. 2. Analytics - if we do data operation on future data it is known as data analytics.
There are three type of variable we have in machine learning, Uni-variate, Bi-variate and Multivariate. Graphs plotted for Uni-variate, Bi-variate and Multivariate are known as Uni-variate, Bi-variate and Multivariate distribution graphs.
Let's do some practicals of data visualize using seaborn library-

Code :-
import seaborn as sns
sns.set()
tips = sns.load_dataset('tips')
tips.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.3 KB

tips.head(5)
tips.columns

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

bill = tips['total_bill']
bill.head()

0    16.99
1    10.34
2    21.01
3    23.68
4    24.59
Name: total_bill, dtype: float64

bill.max()

50.81

bill.min()

3.07

sns.distplot(bill)

<matplotlib.axes._subplots.AxesSubplot at 0xe2cacc8>

#Univariate

sns.distplot(bill, kde=False, bins=50)

<matplotlib.axes._subplots.AxesSubplot at 0xe415548>

bill.head()

0    16.99
1    10.34
2    21.01
3    23.68
4    24.59
Name: total_bill, dtype: float64

# Bivariate
sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter')

<seaborn.axisgrid.JointGrid at 0xed40fc8>

Like this you can check document of seaborn for more details and options.

After this jumped on classification in machine learning

If the data is not continuous and only to be decided the probability (to be happen or not) then this kind of use-case we solve using classification approach instead of regression.
There are two types of classification, Binary and multi-classification.
As mentioned it gives probability(0 or 1).
In Binary classification we have to set cut-off point to decided if something will happen or not.
To solve binary classification use cases we can use sklearn, which is known as Logistic regression, and sklearn use sigmoid function(1/1+e^-x) in background.

Rakesh Kumar

Search this blog

MLOps - Day 15

Labels

Comments

Post a Comment

Popular posts from this blog

error: db5 error(11) from dbenv->open: Resource temporarily unavailable

Failed to get D-Bus connection: Operation not permitted

Docker project