Commit 2550b044 authored by Céline Meillier's avatar Céline Meillier

Upload notebook 0-IntroductionDataScience

parent 4ca59dd0
{
"cells": [
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import numpy as np \n",
"import csv\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"sns.set(color_codes = True)\n",
"from scipy import stats\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# 1. La science des données c'est quoi ? \n",
"\n",
"**Définitions du [dictionnaire Larousse](https://www.larousse.fr/dictionnaires/francais/donnée/26436) :** \n",
"\n",
"* Ce qui est connu ou admis comme tel, sur lequel on peut fonder un raisonnement, qui sert de point de départ pour une recherche.\n",
"* Renseignement qui sert de point d'appui.\n",
"* Représentation conventionnelle d'une information en vue de son traitement informatique.\n",
"* Dans un problème de mathématiques, hypothèse figurant dans l'énoncé.\n",
"* Résultats d'observations ou d'expériences faites délibérément ou à l'occasion d'autres tâches et soumis aux méthodes statistiques."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**La science des données** \n",
"\n",
"La science des données est par nature transdiciplinaire, elle fait appel à plusieurs types de connaissances : \n",
"* la discipline associée à l'application étudiée (physique, chimie, sciences humaines, astronomie, biologie, etc) \n",
"* les mathématiques pour la modélisation et l'extraction d'information\n",
"* l'informatique pour la collecte, la sauvegarde, le stockage, le traitement et la représentation des données \n",
"\n",
"La science des données a pour but de d'extraire de l'information d'une masse de données, de synthétiser et représenter cette information. Les différentes connaissances pour faire de la science des données peuvent être regroupées dans différentes branches. Une illustration est proposée par Swami Chandrasekaran sous la forme d'un plan de métro où chaque ligne représente une branche/discipline de la science des données."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"![DataScienceMap](http://nirvacana.com/thoughts/wp-content/uploads/2018/01/RoadToDataScientist1.png)\n",
"Source de l'image : `http://nirvacana.com/thoughts/2013/07/08/becoming-a-data-scientist/`. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 1.1. A quoi servent les données \n",
"\n",
"Les données servent à répondre à une question, il faut donc poser clairement la question, modéliser les données, et les interpréter pour répondre à cette question. \n",
"\n",
"Les données seules ne suffisent pas à apporter une réponse, le travail du traiteur de données consiste à inclure des connaissances a priori sur les données. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### 1.2. Un exemple de jeu de données `movie_metadata.csv` \n",
"\n",
"Toutes les notions abordées dans ce cours seront illustrées sur ce jeu de données réelles. Il s'agit d'un extrait de la base de données mise en ligne sur le site [kaggle](kaggle.com) (la base de données a depuis été mise à jour par kaggle sous le nom de [TMdb](https://www.kaggle.com/tmdb/tmdb-movie-metadata)).\n",
"\n",
"**Chargement des données avec Pandas** \n",
"\n",
"Utilisation du package Pandas ([Python Data Analysis Librairy](https://pandas.pydata.org)) qui fournit notamment l'objet DataFrame dans lequel nous allons stocker le jeu de données étudier. Cet objet DataFrame est muni d'un grand nombre de fonctions permettant :\n",
"* la récupération et la mise en forme des données \n",
"* le nettoyage des données (prétraitements) \n",
"* la représentation graphique et numérique des données \n",
"* l'analyse statistique des données\n",
"\n",
"Consulter la documentation pour l'utilisation des fonctions de la classe DataFrame : http://pandas.pydata.org/pandas-docs/stable/reference/frame.html"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Nombre de films dans la base : 4125\n"
]
}
],
"source": [
"DATA = pd.read_csv('data/movie_metadata2.csv', delimiter=';', index_col='movie_title')\n",
"print(type(DATA))\n",
"print('Nombre de films dans la base : ' + str(len(DATA.budget))) "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>director_name</th>\n",
" <th>num_critic_for_reviews</th>\n",
" <th>duration</th>\n",
" <th>actor_1_name</th>\n",
" <th>actor_2_name</th>\n",
" <th>num_voted_users</th>\n",
" <th>facenumber_in_poster</th>\n",
" <th>num_user_for_reviews</th>\n",
" <th>language</th>\n",
" <th>country</th>\n",
" <th>content_rating</th>\n",
" <th>budget</th>\n",
" <th>title_year</th>\n",
" <th>imdb_score</th>\n",
" </tr>\n",
" <tr>\n",
" <th>movie_title</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>The Shawshank Redemption</th>\n",
" <td>Frank Darabont</td>\n",
" <td>199</td>\n",
" <td>142</td>\n",
" <td>Morgan Freeman</td>\n",
" <td>Jeffrey DeMunn</td>\n",
" <td>1689764</td>\n",
" <td>0</td>\n",
" <td>4144</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>R</td>\n",
" <td>25000000</td>\n",
" <td>1994</td>\n",
" <td>9.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Dark Knight</th>\n",
" <td>Christopher Nolan</td>\n",
" <td>645</td>\n",
" <td>152</td>\n",
" <td>Christian Bale</td>\n",
" <td>Heath Ledger</td>\n",
" <td>1676169</td>\n",
" <td>0</td>\n",
" <td>4667</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>PG-13</td>\n",
" <td>185000000</td>\n",
" <td>2008</td>\n",
" <td>9.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Inception</th>\n",
" <td>Christopher Nolan</td>\n",
" <td>642</td>\n",
" <td>148</td>\n",
" <td>Leonardo DiCaprio</td>\n",
" <td>Tom Hardy</td>\n",
" <td>1468200</td>\n",
" <td>0</td>\n",
" <td>2803</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>PG-13</td>\n",
" <td>160000000</td>\n",
" <td>2010</td>\n",
" <td>8.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Fight Club</th>\n",
" <td>David Fincher</td>\n",
" <td>315</td>\n",
" <td>151</td>\n",
" <td>Brad Pitt</td>\n",
" <td>Meat Loaf</td>\n",
" <td>1347461</td>\n",
" <td>2</td>\n",
" <td>2968</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>R</td>\n",
" <td>63000000</td>\n",
" <td>1999</td>\n",
" <td>8.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pulp Fiction</th>\n",
" <td>Quentin Tarantino</td>\n",
" <td>215</td>\n",
" <td>178</td>\n",
" <td>Bruce Willis</td>\n",
" <td>Eric Stoltz</td>\n",
" <td>1324680</td>\n",
" <td>1</td>\n",
" <td>2195</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>R</td>\n",
" <td>8000000</td>\n",
" <td>1994</td>\n",
" <td>8.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Forrest Gump</th>\n",
" <td>Robert Zemeckis</td>\n",
" <td>149</td>\n",
" <td>142</td>\n",
" <td>Tom Hanks</td>\n",
" <td>Siobhan Fallon Hogan</td>\n",
" <td>1251222</td>\n",
" <td>0</td>\n",
" <td>1398</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>PG-13</td>\n",
" <td>55000000</td>\n",
" <td>1994</td>\n",
" <td>8.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Lord of the Rings: The Fellowship of the Ring</th>\n",
" <td>Peter Jackson</td>\n",
" <td>297</td>\n",
" <td>171</td>\n",
" <td>Christopher Lee</td>\n",
" <td>Orlando Bloom</td>\n",
" <td>1238746</td>\n",
" <td>2</td>\n",
" <td>5060</td>\n",
" <td>English</td>\n",
" <td>New Zealand</td>\n",
" <td>PG-13</td>\n",
" <td>93000000</td>\n",
" <td>2001</td>\n",
" <td>8.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Matrix</th>\n",
" <td>Lana Wachowski</td>\n",
" <td>313</td>\n",
" <td>136</td>\n",
" <td>Keanu Reeves</td>\n",
" <td>Marcus Chong</td>\n",
" <td>1217752</td>\n",
" <td>3</td>\n",
" <td>3646</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>R</td>\n",
" <td>63000000</td>\n",
" <td>1999</td>\n",
" <td>8.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Lord of the Rings: The Return of the King</th>\n",
" <td>Peter Jackson</td>\n",
" <td>328</td>\n",
" <td>192</td>\n",
" <td>Orlando Bloom</td>\n",
" <td>Billy Boyd</td>\n",
" <td>1215718</td>\n",
" <td>2</td>\n",
" <td>3189</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>PG-13</td>\n",
" <td>94000000</td>\n",
" <td>2003</td>\n",
" <td>8.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Godfather</th>\n",
" <td>Francis Ford Coppola</td>\n",
" <td>208</td>\n",
" <td>175</td>\n",
" <td>Al Pacino</td>\n",
" <td>Marlon Brando</td>\n",
" <td>1155770</td>\n",
" <td>1</td>\n",
" <td>2238</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>R</td>\n",
" <td>6000000</td>\n",
" <td>1972</td>\n",
" <td>9.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Dark Knight Rises</th>\n",
" <td>Christopher Nolan</td>\n",
" <td>813</td>\n",
" <td>164</td>\n",
" <td>Tom Hardy</td>\n",
" <td>Christian Bale</td>\n",
" <td>1144337</td>\n",
" <td>0</td>\n",
" <td>2701</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>PG-13</td>\n",
" <td>250000000</td>\n",
" <td>2012</td>\n",
" <td>8.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Lord of the Rings: The Two Towers</th>\n",
" <td>Peter Jackson</td>\n",
" <td>294</td>\n",
" <td>172</td>\n",
" <td>Christopher Lee</td>\n",
" <td>Orlando Bloom</td>\n",
" <td>1100446</td>\n",
" <td>1</td>\n",
" <td>2417</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>PG-13</td>\n",
" <td>94000000</td>\n",
" <td>2002</td>\n",
" <td>8.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Se7en</th>\n",
" <td>David Fincher</td>\n",
" <td>216</td>\n",
" <td>127</td>\n",
" <td>Morgan Freeman</td>\n",
" <td>Brad Pitt</td>\n",
" <td>1023511</td>\n",
" <td>0</td>\n",
" <td>1080</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>R</td>\n",
" <td>33000000</td>\n",
" <td>1995</td>\n",
" <td>8.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The Avengers</th>\n",
" <td>Joss Whedon</td>\n",
" <td>703</td>\n",
" <td>173</td>\n",
" <td>Chris Hemsworth</td>\n",
" <td>Robert Downey Jr.</td>\n",
" <td>995415</td>\n",
" <td>3</td>\n",
" <td>1722</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>PG-13</td>\n",
" <td>220000000</td>\n",
" <td>2012</td>\n",
" <td>8.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Gladiator</th>\n",
" <td>Ridley Scott</td>\n",
" <td>265</td>\n",
" <td>171</td>\n",
" <td>Djimon Hounsou</td>\n",
" <td>Connie Nielsen</td>\n",
" <td>982637</td>\n",
" <td>0</td>\n",
" <td>2368</td>\n",
" <td>English</td>\n",
" <td>USA</td>\n",
" <td>R</td>\n",
" <td>103000000</td>\n",
" <td>2000</td>\n",
" <td>8.5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" director_name \\\n",
"movie_title \n",
"The Shawshank Redemption Frank Darabont \n",
"The Dark Knight Christopher Nolan \n",
"Inception Christopher Nolan \n",
"Fight Club David Fincher \n",
"Pulp Fiction Quentin Tarantino \n",
"Forrest Gump Robert Zemeckis \n",
"The Lord of the Rings: The Fellowship of the Ring Peter Jackson \n",
"The Matrix Lana Wachowski \n",
"The Lord of the Rings: The Return of the King Peter Jackson \n",
"The Godfather Francis Ford Coppola \n",
"The Dark Knight Rises Christopher Nolan \n",
"The Lord of the Rings: The Two Towers Peter Jackson \n",
"Se7en David Fincher \n",
"The Avengers Joss Whedon \n",
"Gladiator Ridley Scott \n",
"\n",
" num_critic_for_reviews \\\n",
"movie_title \n",
"The Shawshank Redemption 199 \n",
"The Dark Knight 645 \n",
"Inception 642 \n",
"Fight Club 315 \n",
"Pulp Fiction 215 \n",
"Forrest Gump 149 \n",
"The Lord of the Rings: The Fellowship of the Ring 297 \n",
"The Matrix 313 \n",
"The Lord of the Rings: The Return of the King 328 \n",
"The Godfather 208 \n",
"The Dark Knight Rises 813 \n",
"The Lord of the Rings: The Two Towers 294 \n",
"Se7en 216 \n",
"The Avengers 703 \n",
"Gladiator 265 \n",
"\n",
" duration \\\n",
"movie_title \n",
"The Shawshank Redemption 142 \n",
"The Dark Knight 152 \n",
"Inception 148 \n",
"Fight Club 151 \n",
"Pulp Fiction 178 \n",
"Forrest Gump 142 \n",
"The Lord of the Rings: The Fellowship of the Ring 171 \n",
"The Matrix 136 \n",
"The Lord of the Rings: The Return of the King 192 \n",
"The Godfather 175 \n",
"The Dark Knight Rises 164 \n",
"The Lord of the Rings: The Two Towers 172 \n",
"Se7en 127 \n",
"The Avengers 173 \n",
"Gladiator 171 \n",
"\n",
" actor_1_name \\\n",
"movie_title \n",
"The Shawshank Redemption Morgan Freeman \n",
"The Dark Knight Christian Bale \n",
"Inception Leonardo DiCaprio \n",
"Fight Club Brad Pitt \n",
"Pulp Fiction Bruce Willis \n",
"Forrest Gump Tom Hanks \n",
"The Lord of the Rings: The Fellowship of the Ring Christopher Lee \n",
"The Matrix Keanu Reeves \n",
"The Lord of the Rings: The Return of the King Orlando Bloom \n",
"The Godfather Al Pacino \n",
"The Dark Knight Rises Tom Hardy \n",
"The Lord of the Rings: The Two Towers Christopher Lee \n",
"Se7en Morgan Freeman \n",
"The Avengers Chris Hemsworth \n",
"Gladiator Djimon Hounsou \n",
"\n",
" actor_2_name \\\n",
"movie_title \n",
"The Shawshank Redemption Jeffrey DeMunn \n",
"The Dark Knight Heath Ledger \n",
"Inception Tom Hardy \n",
"Fight Club Meat Loaf \n",
"Pulp Fiction Eric Stoltz \n",
"Forrest Gump Siobhan Fallon Hogan \n",
"The Lord of the Rings: The Fellowship of the Ring Orlando Bloom \n",
"The Matrix Marcus Chong \n",
"The Lord of the Rings: The Return of the King Billy Boyd \n",
"The Godfather Marlon Brando \n",
"The Dark Knight Rises Christian Bale \n",
"The Lord of the Rings: The Two Towers Orlando Bloom \n",
"Se7en Brad Pitt \n",
"The Avengers Robert Downey Jr. \n",
"Gladiator Connie Nielsen \n",
"\n",
" num_voted_users \\\n",
"movie_title \n",
"The Shawshank Redemption 1689764 \n",
"The Dark Knight 1676169 \n",
"Inception 1468200 \n",
"Fight Club 1347461 \n",
"Pulp Fiction 1324680 \n",
"Forrest Gump 1251222 \n",
"The Lord of the Rings: The Fellowship of the Ring 1238746 \n",
"The Matrix 1217752 \n",
"The Lord of the Rings: The Return of the King 1215718 \n",
"The Godfather 1155770 \n",
"The Dark Knight Rises 1144337 \n",
"The Lord of the Rings: The Two Towers 1100446 \n",
"Se7en 1023511 \n",
"The Avengers 995415 \n",
"Gladiator 982637 \n",
"\n",
" facenumber_in_poster \\\n",
"movie_title \n",
"The Shawshank Redemption 0 \n",
"The Dark Knight 0 \n",
"Inception 0 \n",
"Fight Club 2 \n",
"Pulp Fiction 1 \n",
"Forrest Gump 0 \n",
"The Lord of the Rings: The Fellowship of the Ring 2 \n",
"The Matrix 3 \n",
"The Lord of the Rings: The Return of the King 2 \n",
"The Godfather 1 \n",
"The Dark Knight Rises 0 \n",
"The Lord of the Rings: The Two Towers 1 \n",
"Se7en 0 \n",
"The Avengers 3 \n",
"Gladiator 0 \n",
"\n",
" num_user_for_reviews \\\n",
"movie_title \n",
"The Shawshank Redemption 4144 \n",
"The Dark Knight 4667 \n",
"Inception 2803 \n",
"Fight Club 2968 \n",
"Pulp Fiction 2195 \n",
"Forrest Gump 1398 \n",
"The Lord of the Rings: The Fellowship of the Ring 5060 \n",
"The Matrix 3646 \n",
"The Lord of the Rings: The Return of the King 3189 \n",
"The Godfather 2238 \n",
"The Dark Knight Rises 2701 \n",
"The Lord of the Rings: The Two Towers 2417 \n",
"Se7en 1080 \n",
"The Avengers 1722 \n",
"Gladiator 2368 \n",
"\n",
" language country \\\n",
"movie_title \n",
"The Shawshank Redemption English USA \n",
"The Dark Knight English USA \n",
"Inception English USA \n",
"Fight Club English USA \n",
"Pulp Fiction English USA \n",
"Forrest Gump English USA \n",
"The Lord of the Rings: The Fellowship of the Ring English New Zealand \n",
"The Matrix English USA \n",
"The Lord of the Rings: The Return of the King English USA \n",
"The Godfather English USA \n",
"The Dark Knight Rises English USA \n",
"The Lord of the Rings: The Two Towers English USA \n",
"Se7en English USA \n",
"The Avengers English USA \n",
"Gladiator English USA \n",
"\n",
" content_rating budget \\\n",
"movie_title \n",
"The Shawshank Redemption R 25000000 \n",
"The Dark Knight PG-13 185000000 \n",
"Inception PG-13 160000000 \n",
"Fight Club R 63000000 \n",
"Pulp Fiction R 8000000 \n",
"Forrest Gump PG-13 55000000 \n",
"The Lord of the Rings: The Fellowship of the Ring PG-13 93000000 \n",
"The Matrix R 63000000 \n",
"The Lord of the Rings: The Return of the King PG-13 94000000 \n",
"The Godfather R 6000000 \n",
"The Dark Knight Rises PG-13 250000000 \n",
"The Lord of the Rings: The Two Towers PG-13 94000000 \n",
"Se7en R 33000000 \n",
"The Avengers PG-13 220000000 \n",
"Gladiator R 103000000 \n",
"\n",
" title_year imdb_score \n",
"movie_title \n",
"The Shawshank Redemption 1994 9.3 \n",
"The Dark Knight 2008 9.0 \n",
"Inception 2010 8.8 \n",
"Fight Club 1999 8.8 \n",
"Pulp Fiction 1994 8.9 \n",
"Forrest Gump 1994 8.8 \n",
"The Lord of the Rings: The Fellowship of the Ring 2001 8.8 \n",
"The Matrix 1999 8.7 \n",
"The Lord of the Rings: The Return of the King 2003 8.9 \n",
"The Godfather 1972 9.2 \n",
"The Dark Knight Rises 2012 8.5 \n",
"The Lord of the Rings: The Two Towers 2002 8.7 \n",
"Se7en 1995 8.6 \n",
"The Avengers 2012 8.1 \n",
"Gladiator 2000 8.5 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DATA.head(n = 15)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# 2. Vocabulaire utile\n",
"\n",
"Afin de parler le même langage, quelques définitions/rappels de statistiques : \n",
"\n",
"**Population :** ensemble d’individus ou d’objets étudiés.\n",
"\n",
"<span style=\"color: #27AE60\">exemple :</span>\n",
"\n",
"**Individu :** élément d’une population.\n",
"\n",
"<span style=\"color: #27AE60\">exemple :</span>\n",
"\n",
"**Variable(s) :** on parle aussi de caractère(s). Une variable $X$ représente une\n",
"caractéristique dont on observe/recueille la valeur pour chaque\n",
"individu d’une population. Une population peut être caractérisée par\n",
"une (analyse unidimensionnelle) ou plusieurs variables (analyse\n",
"multidimensionnelle). La variable statistique (ou aléatoire) est notée\n",
"en majuscule $X$, les valeurs qu’elle prend sont notées en minuscules $(x_1, x_2, \\cdots)$.\n",
"\n",
"<span style=\"color: #27AE60\">exemple :</span>\n",
"\n",
"**Valeurs :** ce sont les valeurs numériques ou modalités $(x_1, x_2, \\cdots)$ prises par la\n",
"variable d’intérêt $X$ pour les $N$ individus de la population.\n",
"\n",
"<span style=\"color: #27AE60\">exemple :</span>"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Effectif :** nombre d’individus ni de la population pour lesquels la variable $X$\n",
"prend une valeur donnée $x_i$.\n",
"\n",
"<span style=\"color: #27AE60\">exemple :</span>\n",
"\n",
"**Fréquence :** la fréquence $f_i$ associée à la valeur $x_i$ est le rapport de l’effectif de\n",
"cette valeur sur la taille $N$ de la population. La somme des\n",
"fréquences est égale à 1 (ou 100 si l’on travaille en $\\%$).\n",
"\n",
"<span style=\"color: #27AE60\">exemple :</span>\n",
"\n",
"\n",
"**Etendue :** Soit $(x_1, x_2, \\cdots, x_N)$ un ensemble de valeurs. Notons $x_{max}$ la valeur\n",
"maximale de l’ensemble $(x_1, x_2, \\cdots, x_N)$ et $x_{min}$ la valeur minimale. On\n",
"appelle étendue de l’ensemble de valeurs $(x_1, x_2, \\cdots, x_N)$ la différence\n",
"$x_{max} - x_{min}$.\n",
"\n",
"<span style=\"color: #27AE60\">exemple :</span>\n",
"\n",
"\n",
"  "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Nom des variables stockées** "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['director_name', 'num_critic_for_reviews', 'duration', 'actor_1_name',\n",
" 'actor_2_name', 'num_voted_users', 'facenumber_in_poster',\n",
" 'num_user_for_reviews', 'language', 'country', 'content_rating',\n",
" 'budget', 'title_year', 'imdb_score'],\n",
" dtype='object')\n"
]
}
],
"source": [
"print(DATA.keys())"
]
},
{
"cell_type": "markdown",
"metadata": {