Today's learning :-
Data Visualization:-
#Univariate
bill.head()
sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter')
Like this you can check document of
seaborn for more details and options.
After this jumped on classification in machine learning
Data Visualization:-
- I'll continue to learning data visualization, As we know there are lots of libraries we have in python for data visualization (matplotlib, seabon, folium etc.). Today I'll learn one more library called 'seaborn'.
- Seaborn, a library of python which help us to achieve statistical graphs, but in background it use matplotlib.
- We have do two types of operations on data, 1. Analysis - if we do data operation on past data it is known as data analysis. 2. Analytics - if we do data operation on future data it is known as data analytics.
- There are three type of variable we have in machine learning, Uni-variate, Bi-variate and Multivariate. Graphs plotted for Uni-variate, Bi-variate and Multivariate are known as Uni-variate, Bi-variate and Multivariate distribution graphs.
- Let's do some practicals of data visualize using seaborn library-
Code :-
import seaborn as sns
sns.set()
tips = sns.load_dataset('tips')
tips.info()
tips.columns
bill.head()
import seaborn as sns
sns.set()
tips = sns.load_dataset('tips')
tips.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 244 entries, 0 to 243 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 total_bill 244 non-null float64 1 tip 244 non-null float64 2 sex 244 non-null category 3 smoker 244 non-null category 4 day 244 non-null category 5 time 244 non-null category 6 size 244 non-null int64 dtypes: category(4), float64(2), int64(1) memory usage: 7.3 KBtips.head(5)
tips.columns
Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')bill = tips['total_bill']
bill.head()
0 16.99 1 10.34 2 21.01 3 23.68 4 24.59 Name: total_bill, dtype: float64bill.max()
50.81bill.min()
3.07sns.distplot(bill)
sns.distplot(bill, kde=False, bins=50)
0 16.99 1 10.34 2 21.01 3 23.68 4 24.59 Name: total_bill, dtype: float64# Bivariate
sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter')
After this jumped on classification in machine learning
- If the data is not continuous and only to be decided the probability (to be happen or not) then this kind of use-case we solve using classification approach instead of regression.
- There are two types of classification, Binary and multi-classification.
- As mentioned it gives probability(0 or 1).
- In Binary classification we have to set cut-off point to decided if something will happen or not.
- To solve binary classification use cases we can use sklearn, which is known as Logistic regression, and sklearn use sigmoid function(1/1+e^-x) in background.
Comments
Post a Comment
Please share your experience.....