{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Wrangling Project\n", "\n", "#### _by Tatiana Kurilo_ " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data gathering, assessing and cleaning stages are documented in [wrangle_report.html](wrangle_report.html). \n", "Parts of the data analysis and visualisations are presented in a more \"reader-friendly\" way in [act_report.html](act_report.html).\n", "\n", "## Table of Contents\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Data Gathering" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# imports\n", "\n", "import os\n", "import time\n", "import requests\n", "import pandas as pd\n", "import tweepy\n", "import json\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading Twitter Archive Data Locally" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I downloaded `twitter_archive_enhanced.csv`, uploaded it to the Project Workspace on Udacity and read it to the dataframe `twitter_archive` with `pandas`. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
0892420643555336193NaNNaN2017-08-01 16:23:56 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Phineas. He's a mystical boy. Only eve...NaNNaNNaNhttps://twitter.com/dog_rates/status/892420643...1310PhineasNoneNoneNoneNone
\n", "
" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "0 892420643555336193 NaN NaN \n", "\n", " timestamp \\\n", "0 2017-08-01 16:23:56 +0000 \n", "\n", " source \\\n", "0 \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urlimg_nump1p1_confp1_dogp2p2_confp2_dogp3p3_confp3_dog
0666020888022790149https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg1Welsh_springer_spaniel0.465074Truecollie0.156665TrueShetland_sheepdog0.061428True
\n", "" ], "text/plain": [ " tweet_id jpg_url \\\n", "0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg \n", "\n", " img_num p1 p1_conf p1_dog p2 p2_conf \\\n", "0 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 \n", "\n", " p2_dog p3 p3_conf p3_dog \n", "0 True Shetland_sheepdog 0.061428 True " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# loading image prediction data\n", "\n", "image_predictions = pd.read_csv('image_predictions.tsv', sep = '\\t')\n", "image_predictions.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gathering Information Via Twitter API\n", "\n", "I wrote a script to get Twitter JSON data via API with `Tweepy` library, using the list of Tweet IDs from `twitter_archive` dataframe, and saved it to `tweet_json.txt` file. I uploaded this file to the Project Workspace and added the code of the script to the project notebook without authentification keys. Since it would cause errors if left that way, I commented the cell that contains the code. I read the data from `tweet_json.txt` to the dataframe `tweet_jsons`, using `json` and `pandas` libraries." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Twitter data gathering script. Uncomment and add your keys to run.\n", "\n", "#tokens = {\"consumer_key\": \"\",\n", "# \"consumer_secret\": \"\",\n", "# \"oauth_token\": \"\",\n", "# \"oauth_token_secret\": \"\"}\n", "#\n", "#consumer_key = tokens[\"consumer_key\"]\n", "#consumer_secret = tokens[\"consumer_secret\"]\n", "#oauth_token = tokens[\"oauth_token\"]\n", "#oauth_token_secret = tokens[\"oauth_token_secret\"]\n", "#\n", "#auth = tweepy.OAuthHandler(consumer_key, consumer_secret)\n", "#auth.set_access_token(oauth_token, oauth_token_secret)\n", "#api = tweepy.API(auth)\n", "#\n", "#auth = tweepy.OAuthHandler(consumer_key, consumer_secret)\n", "#auth.set_access_token(oauth_token, oauth_token_secret)\n", "#api = tweepy.API(auth, wait_on_rate_limit = True)\n", "#\n", "#filename = 'tweet_json.txt'\n", "#\n", "#try:\n", "# os.remove(filename)\n", "#except OSError:\n", "# pass\n", "#\n", "#tweet_errors = {}\n", "#count = 0\n", "#\n", "#with open(filename, 'a') as f:\n", "# for tweet_id in twitter_archive['tweet_id']:\n", "# try:\n", "# tweet = api.get_status(tweet_id, tweet_mode='extended')\n", "# json.dump(tweet._json, f)\n", "# f.write('\\n')\n", "# count += 1\n", "# except tweepy.TweepError as e:\n", "# print(tweet_id, e.args[0][0]['message'])\n", "# tweet_errors[tweet_id] = e.reason\n", "# time.sleep(1.2)\n", "# if count % 100 == 0:\n", "# print(count)\n", "#\n", "#print(\"Errors:\", tweet_errors)\n", "\n", "#print(\"Count:\", str(count))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Count: 2340\n" ] } ], "source": [ "# script output: count\n", "\n", "print(\"Count: 2340\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# script output: errors\n", "\n", "errors = {888202515573088257: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 873697596434513921: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 872668790621863937: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 869988702071779329: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 866816280283807744: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 861769973181624320: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 845459076796616705: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 842892208864923648: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 837012587749474308: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 827228250799742977: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 812747805718642688: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 802247111496568832: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 775096608509886464: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 770743923962707968: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 754011816964026368: \"[{'code': 144, 'message': 'No status found with that ID.'}]\", \n", " 680055455951884288: \"[{'code': 144, 'message': 'No status found with that ID.'}]\"}\n", "\n", "len(list(errors.keys()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 16 tweets in the original Twitter archive data, which are now missing online. For other 2340 tweets the additional information on likes and retweets was gathered successfully." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
favorite_countretweet_counttweet_id
0378558260892420643555336193
1325266104892177421306343426
2244924041891815181378084864
3412058410891689557279858688
4393849109891327558926688256
\n", "
" ], "text/plain": [ " favorite_count retweet_count tweet_id\n", "0 37855 8260 892420643555336193\n", "1 32526 6104 892177421306343426\n", "2 24492 4041 891815181378084864\n", "3 41205 8410 891689557279858688\n", "4 39384 9109 891327558926688256" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# reading JSON data from the text file\n", "\n", "json_list = []\n", "\n", "with open('tweet_json.txt') as f:\n", " for line in f.readlines():\n", " a_json = json.loads(line)\n", " json_list.append({'tweet_id': a_json['id'], \n", " 'favorite_count': a_json['favorite_count'], \n", " 'retweet_count': a_json['retweet_count']})\n", " \n", "tweet_jsons = pd.DataFrame(json_list)\n", "tweet_jsons.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idfavorite_countretweet_count
0892420643555336193378558260
1892177421306343426325266104
2891815181378084864244924041
3891689557279858688412058410
4891327558926688256393849109
\n", "
" ], "text/plain": [ " tweet_id favorite_count retweet_count\n", "0 892420643555336193 37855 8260\n", "1 892177421306343426 32526 6104\n", "2 891815181378084864 24492 4041\n", "3 891689557279858688 41205 8410\n", "4 891327558926688256 39384 9109" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rearranging columns\n", "\n", "tweet_jsons = tweet_jsons[['tweet_id', 'favorite_count', 'retweet_count']]\n", "tweet_jsons.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "# Data Assessing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### WeRateDogs Twitter Archive Data" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2356, 17)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.shape" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2356 entries, 0 to 2355\n", "Data columns (total 17 columns):\n", "tweet_id 2356 non-null int64\n", "in_reply_to_status_id 78 non-null float64\n", "in_reply_to_user_id 78 non-null float64\n", "timestamp 2356 non-null object\n", "source 2356 non-null object\n", "text 2356 non-null object\n", "retweeted_status_id 181 non-null float64\n", "retweeted_status_user_id 181 non-null float64\n", "retweeted_status_timestamp 181 non-null object\n", "expanded_urls 2297 non-null object\n", "rating_numerator 2356 non-null int64\n", "rating_denominator 2356 non-null int64\n", "name 2356 non-null object\n", "doggo 2356 non-null object\n", "floofer 2356 non-null object\n", "pupper 2356 non-null object\n", "puppo 2356 non-null object\n", "dtypes: float64(4), int64(3), object(10)\n", "memory usage: 313.0+ KB\n" ] } ], "source": [ "twitter_archive.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 17 variable in `twitter_archive` dataframe, first 10 of which are from the original Twitter data, and 7 were added later, based mostly on the content of the tweets. \n", "For timestamp columns we can see wrong data types above. Also there are non-null values in columns indicating retweets and replies. Retweets should be excluded by the project guidelines, replies needs to be further assessed. \n", "Since it is impossible for a dog to be in all stages simultaneously, we can assume, that in dog stage columns negative/missing options are encoded with strings, and not `NaN`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
0892420643555336193NaNNaN2017-08-01 16:23:56 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Phineas. He's a mystical boy. Only eve...NaNNaNNaNhttps://twitter.com/dog_rates/status/892420643...1310PhineasNoneNoneNoneNone
1892177421306343426NaNNaN2017-08-01 00:17:27 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Tilly. She's just checking pup on you....NaNNaNNaNhttps://twitter.com/dog_rates/status/892177421...1310TillyNoneNoneNoneNone
2891815181378084864NaNNaN2017-07-31 00:18:03 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Archie. He is a rare Norwegian Pouncin...NaNNaNNaNhttps://twitter.com/dog_rates/status/891815181...1210ArchieNoneNoneNoneNone
3891689557279858688NaNNaN2017-07-30 15:58:51 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Darla. She commenced a snooze mid meal...NaNNaNNaNhttps://twitter.com/dog_rates/status/891689557...1310DarlaNoneNoneNoneNone
4891327558926688256NaNNaN2017-07-29 16:00:24 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Franklin. He would like you to stop ca...NaNNaNNaNhttps://twitter.com/dog_rates/status/891327558...1210FranklinNoneNoneNoneNone
\n", "
" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "0 892420643555336193 NaN NaN \n", "1 892177421306343426 NaN NaN \n", "2 891815181378084864 NaN NaN \n", "3 891689557279858688 NaN NaN \n", "4 891327558926688256 NaN NaN \n", "\n", " timestamp \\\n", "0 2017-08-01 16:23:56 +0000 \n", "1 2017-08-01 00:17:27 +0000 \n", "2 2017-07-31 00:18:03 +0000 \n", "3 2017-07-30 15:58:51 +0000 \n", "4 2017-07-29 16:00:24 +0000 \n", "\n", " source \\\n", "0 \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
2351666049248165822465NaNNaN2015-11-16 00:24:50 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a 1949 1st generation vulpix. Enj...NaNNaNNaNhttps://twitter.com/dog_rates/status/666049248...510NoneNoneNoneNoneNone
2352666044226329800704NaNNaN2015-11-16 00:04:52 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a purebred Piers Morgan. Loves to Netf...NaNNaNNaNhttps://twitter.com/dog_rates/status/666044226...610aNoneNoneNoneNone
2353666033412701032449NaNNaN2015-11-15 23:21:54 +0000<a href=\"http://twitter.com/download/iphone\" r...Here is a very happy pup. Big fan of well-main...NaNNaNNaNhttps://twitter.com/dog_rates/status/666033412...910aNoneNoneNoneNone
2354666029285002620928NaNNaN2015-11-15 23:05:30 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a western brown Mitsubishi terrier. Up...NaNNaNNaNhttps://twitter.com/dog_rates/status/666029285...710aNoneNoneNoneNone
2355666020888022790149NaNNaN2015-11-15 22:32:08 +0000<a href=\"http://twitter.com/download/iphone\" r...Here we have a Japanese Irish Setter. Lost eye...NaNNaNNaNhttps://twitter.com/dog_rates/status/666020888...810NoneNoneNoneNoneNone
\n", "" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "2351 666049248165822465 NaN NaN \n", "2352 666044226329800704 NaN NaN \n", "2353 666033412701032449 NaN NaN \n", "2354 666029285002620928 NaN NaN \n", "2355 666020888022790149 NaN NaN \n", "\n", " timestamp \\\n", "2351 2015-11-16 00:24:50 +0000 \n", "2352 2015-11-16 00:04:52 +0000 \n", "2353 2015-11-15 23:21:54 +0000 \n", "2354 2015-11-15 23:05:30 +0000 \n", "2355 2015-11-15 22:32:08 +0000 \n", "\n", " source \\\n", "2351
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idin_reply_to_status_idin_reply_to_user_idtimestampsourcetextretweeted_status_idretweeted_status_user_idretweeted_status_timestampexpanded_urlsrating_numeratorrating_denominatornamedoggoflooferpupperpuppo
773776249906839351296NaNNaN2016-09-15 02:42:54 +0000<a href=\"http://twitter.com/download/iphone\" r...RT @dog_rates: We only rate dogs. Pls stop sen...7.007478e+174.196984e+092016-02-19 18:24:26 +0000https://twitter.com/dog_rates/status/700747788...1110veryNoneNoneNoneNone
1449696100768806522880NaNNaN2016-02-06 22:38:50 +0000<a href=\"http://vine.co\" rel=\"nofollow\">Vine -...This poor pupper has been stuck in a vortex si...NaNNaNNaNhttps://vine.co/v/i1KWj0vbvA91010NoneNoneNonepupperNone
2135670061506722140161NaNNaN2015-11-27 02:08:07 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Liam. He has a particular set of skill...NaNNaNNaNhttps://twitter.com/dog_rates/status/670061506...1110LiamNoneNoneNoneNone
2232668221241640230912NaNNaN2015-11-22 00:15:33 +0000<a href=\"http://twitter.com/download/iphone\" r...These two dogs are Bo &amp; Smittens. Smittens...NaNNaNNaNhttps://twitter.com/dog_rates/status/668221241...1010NoneNoneNoneNoneNone
21496696848655546204166.693544e+174.196984e+092015-11-26 01:11:28 +0000<a href=\"http://twitter.com/download/iphone\" r...After countless hours of research and hundreds...NaNNaNNaNNaN1110NoneNoneNoneNoneNone
1017746872823977771008NaNNaN2016-06-26 01:08:52 +0000<a href=\"http://twitter.com/download/iphone\" r...This is a carrot. We only rate dogs. Please on...NaNNaNNaNhttps://twitter.com/dog_rates/status/746872823...1110aNoneNoneNoneNone
1595686358356425093120NaNNaN2016-01-11 01:25:58 +0000<a href=\"http://twitter.com/download/iphone\" r...Heartwarming scene here. Son reuniting w fathe...NaNNaNNaNhttps://twitter.com/dog_rates/status/686358356...1010NoneNoneNoneNoneNone
1616685198997565345792NaNNaN2016-01-07 20:39:06 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Alfie. That is his time machine. He's ...NaNNaNNaNhttps://twitter.com/dog_rates/status/685198997...1110AlfieNoneNoneNoneNone
2300667062181243039745NaNNaN2015-11-18 19:29:52 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Keet. He is a Floridian Amukamara. Abs...NaNNaNNaNhttps://twitter.com/dog_rates/status/667062181...1010KeetNoneNoneNoneNone
1086738166403467907072NaNNaN2016-06-02 00:32:39 +0000<a href=\"http://twitter.com/download/iphone\" r...This is Axel. He's a professional leaf catcher...NaNNaNNaNhttps://twitter.com/dog_rates/status/738166403...1210AxelNoneNoneNoneNone
\n", "" ], "text/plain": [ " tweet_id in_reply_to_status_id in_reply_to_user_id \\\n", "773 776249906839351296 NaN NaN \n", "1449 696100768806522880 NaN NaN \n", "2135 670061506722140161 NaN NaN \n", "2232 668221241640230912 NaN NaN \n", "2149 669684865554620416 6.693544e+17 4.196984e+09 \n", "1017 746872823977771008 NaN NaN \n", "1595 686358356425093120 NaN NaN \n", "1616 685198997565345792 NaN NaN \n", "2300 667062181243039745 NaN NaN \n", "1086 738166403467907072 NaN NaN \n", "\n", " timestamp \\\n", "773 2016-09-15 02:42:54 +0000 \n", "1449 2016-02-06 22:38:50 +0000 \n", "2135 2015-11-27 02:08:07 +0000 \n", "2232 2015-11-22 00:15:33 +0000 \n", "2149 2015-11-26 01:11:28 +0000 \n", "1017 2016-06-26 01:08:52 +0000 \n", "1595 2016-01-11 01:25:58 +0000 \n", "1616 2016-01-07 20:39:06 +0000 \n", "2300 2015-11-18 19:29:52 +0000 \n", "1086 2016-06-02 00:32:39 +0000 \n", "\n", " source \\\n", "773
Vine -... \n", "2135 Twitter for iPhone 2221\n", "Vine - Make a Scene 91\n", "Twitter Web Client 33\n", "TweetDeck 11\n", "Name: source, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.source.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `source` variable can be converted to category type, since it has limited number of values. However, the HTML information should be excluded for readabitily." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idexpanded_urls
1452695767669421768709https://twitter.com/dog_rates/status/695767669421768709/photo/1
1526690374419777196032https://twitter.com/dog_rates/status/690374419777196032/photo/1
1377701601587219795968https://twitter.com/dog_rates/status/701601587219795968/photo/1
871761599872357261312https://twitter.com/dog_rates/status/761599872357261312/photo/1
343832040443403784192https://twitter.com/dog_rates/status/769940425801170949/photo/1,https://twitter.com/dog_rates/status/769940425801170949/photo/1,https://twitter.com/dog_rates/status/769940425801170949/photo/1,https://twitter.com/dog_rates/status/769940425801170949/photo/1
2307666826780179869698https://twitter.com/dog_rates/status/666826780179869698/photo/1
1588686730991906516992https://twitter.com/dog_rates/status/686730991906516992/photo/1
1040744223424764059648https://twitter.com/strange_animals/status/672108316018024452
990748705597323898880https://twitter.com/dog_rates/status/748705597323898880/video/1
1015747103485104099331https://twitter.com/dog_rates/status/747103485104099331/photo/1,https://twitter.com/dog_rates/status/747103485104099331/photo/1,https://twitter.com/dog_rates/status/747103485104099331/photo/1,https://twitter.com/dog_rates/status/747103485104099331/photo/1
\n", "
" ], "text/plain": [ " tweet_id \\\n", "1452 695767669421768709 \n", "1526 690374419777196032 \n", "1377 701601587219795968 \n", "871 761599872357261312 \n", "343 832040443403784192 \n", "2307 666826780179869698 \n", "1588 686730991906516992 \n", "1040 744223424764059648 \n", "990 748705597323898880 \n", "1015 747103485104099331 \n", "\n", " expanded_urls \n", "1452 https://twitter.com/dog_rates/status/695767669421768709/photo/1 \n", "1526 https://twitter.com/dog_rates/status/690374419777196032/photo/1 \n", "1377 https://twitter.com/dog_rates/status/701601587219795968/photo/1 \n", "871 https://twitter.com/dog_rates/status/761599872357261312/photo/1 \n", "343 https://twitter.com/dog_rates/status/769940425801170949/photo/1,https://twitter.com/dog_rates/status/769940425801170949/photo/1,https://twitter.com/dog_rates/status/769940425801170949/photo/1,https://twitter.com/dog_rates/status/769940425801170949/photo/1 \n", "2307 https://twitter.com/dog_rates/status/666826780179869698/photo/1 \n", "1588 https://twitter.com/dog_rates/status/686730991906516992/photo/1 \n", "1040 https://twitter.com/strange_animals/status/672108316018024452 \n", "990 https://twitter.com/dog_rates/status/748705597323898880/video/1 \n", "1015 https://twitter.com/dog_rates/status/747103485104099331/photo/1,https://twitter.com/dog_rates/status/747103485104099331/photo/1,https://twitter.com/dog_rates/status/747103485104099331/photo/1,https://twitter.com/dog_rates/status/747103485104099331/photo/1 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option('display.max_colwidth', -1)\n", "\n", "twitter_archive[twitter_archive.expanded_urls.notnull()][['tweet_id', 'expanded_urls']].sample(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen from the table above, some tweets have duplicated URLs in `expanded_urls` column, which may come from `entities` and `extended_entities` JSON fields of original archive data." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10 2333\n", "11 3 \n", "50 3 \n", "80 2 \n", "20 2 \n", "2 1 \n", "16 1 \n", "40 1 \n", "70 1 \n", "15 1 \n", "90 1 \n", "110 1 \n", "120 1 \n", "130 1 \n", "150 1 \n", "170 1 \n", "7 1 \n", "0 1 \n", "Name: rating_denominator, dtype: int64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.rating_denominator.value_counts()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0 2 \n", "1 9 \n", "2 9 \n", "3 19 \n", "4 17 \n", "5 37 \n", "6 32 \n", "7 55 \n", "8 102\n", "9 158\n", "10 461\n", "11 464\n", "12 558\n", "13 351\n", "14 54 \n", "15 2 \n", "17 1 \n", "20 1 \n", "24 1 \n", "26 1 \n", "27 1 \n", "44 1 \n", "45 1 \n", "50 1 \n", "60 1 \n", "75 2 \n", "80 1 \n", "84 1 \n", "88 1 \n", "99 1 \n", "121 1 \n", "143 1 \n", "144 1 \n", "165 1 \n", "182 1 \n", "204 1 \n", "420 2 \n", "666 1 \n", "960 1 \n", "1776 1 \n", "Name: rating_numerator, dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.rating_numerator.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though ingeneral rating is expected to be in M/N format, where N is 10 and M is below or slightly higher than 10, there are numbers in these two columns, that don't fit in. Theh will require further investigation during cleaning. Also these two columns should be turned into one `rating` column by calculation to be used in further analysis." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rating_numeratorrating_denominatortext
301210@NonWhiteHat @MayhewMayhem omg hello tanner you are a scary good boy 12/10 would pet with extreme caution
551710@roushfenway These are good dogs but 17/10 is an emotional impulse rating. More like 13/10s
641410@RealKentMurphy 14/10 confirmed
1131010@ComplicitOwl @ShopWeRateDogs &gt;10/10 is reserved for dogs
1481210@Jack_Septic_Eye I'd need a few more pics to polish a full analysis, but based on the good boy content above I'm leaning towards 12/10
1491410Ladies and gentlemen... I found Pipsy. He may have changed his name to Pablo, but he never changed his love for the sea. Pupgraded to 14/10 https://t.co/lVU5GyNFen
1791210@Marc_IRL pixelated af 12/10
1841410THIS IS CHARLIE, MARK. HE DID JUST WANT TO SAY HI AFTER ALL. PUPGRADED TO A 14/10. WOULD BE AN HONOR TO FLY WITH https://t.co/p1hBHCmWnA
1861410@xianmcguire @Jenna_Marbles Kardashians wouldn't be famous if as a society we didn't place enormous value on what they do. The dogs are very deserving of their 14/10
18842010@dhmontgomery We also gave snoop dogg a 420/10 but I think that predated your research
18966610@s8n You tried very hard to portray this good boy as not so good, but you have ultimately failed. His goodness shines through. 666/10
2181310@markhoppus MARK THAT DOG HAS SEEN AND EXPERIENCED MANY THINGS. PROBABLY LOST OTHER EAR DOING SOMETHING HEROIC. 13/10 HUG THE DOG HOPPUS
2281110Jerry just apuppologized to me. He said there was no ill-intent to the slippage. I overreacted I admit. Pupgraded to an 11/10 would pet
2341310.@breaannanicolee PUPDATE: Cannon has a heart on his nose. Pupgraded to a 13/10
2511310PUPDATE: I'm proud to announce that Toby is 236 days sober. Pupgraded to a 13/10. We're all very proud of you, Toby https://t.co/a5OaJeRl9B
2741010@0_kelvin_0 &gt;10/10 is reserved for puppos sorry Kevin
29018210@markhoppus 182/10
2911510@bragg6of8 @Andy_Pace_ we are still looking for the first 15/10
3139600@jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho
3421115@docmisterio account started on 11/15/15
3461210@UNC can confirm 12/10
387710I was going to do 007/10, but the joke wasn't worth the &lt;10 rating
4091310@HistoryInPics 13/10
4271310@imgur for a polar bear tho I'd say 13/10 is appropriate
4981210I've been informed by multiple sources that this is actually a dog elf who's tired from helping Santa all night. Pupgraded to 12/10
5131110PUPDATE: I've been informed that Augie was actually bringing his family these flowers when he tripped. Very good boy. Pupgraded to 11/10
5651110Like doggo, like pupper version 2. Both 11/10 https://t.co/9IxWAXFqze
5701110.@NBCSports OMG THE TINY HAT I'M GOING TO HAVE TO SAY 11/10 NBC
5761110@SkyWilliams doggo simply protecting you from evil that which you cannot see. 11/10 would give extra pets
6111110@JODYHiGHROLLER it may be an 11/10 but what do I know 😉
............
14791110Personally I'd give him an 11/10. Not sure why you think you're qualified to rate such a stellar pup.\\n@CommonWhiteGirI
1497910PUPDATE: just noticed this dog has some extra legs. Very advanced. Revolutionary af. Upgraded to a 9/10
15011310These are some pictures of Teddy that further justify his 13/10 rating. Please enjoy https://t.co/tDkJAnQsbQ
1523121012/10 @LightningHoltt
1598420Yes I do realize a rating of 4/20 would've been fitting. However, it would be unjust to give these cooperative pups that low of a rating
16051410Jack deserves another round of applause. If you missed this earlier today I strongly suggest reading it. Wonderful first 14/10 🐶❤️
1618510For those who claim this is a goat, u are wrong. It is not the Greatest Of All Time. The rating of 5/10 should have made that clear. Thank u
16301210After watching this video, we've determined that Pippa will be upgraded to a 12/10. Please enjoy https://t.co/IKoRK4yoxV
1634143130Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3
16632016I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible
1689510I've been told there's a slight possibility he's checking his mirror. We'll bump to 9.5/10. Still a menace
17741310After getting lost in Reese's eyes for several minutes we're going to upgrade him to a 13/10
1819710After some outrage from the crowd. Bubbles is being upgraded to a 7/10. That's as high as I'm going. Thank you
18421110&amp; this is Yoshi. Another world record contender 11/10 (what the hell is happening why are there so many contenders?) https://t.co/QG708dDNH6
1844910This dog is being demoted to a 9/10 for not wearing a helmet while riding. Gotta stay safe out there. Thank you
18521110We've got ourselves a battle here. Watch out Reggie. 11/10 https://t.co/ALJvbtcwf0
18661310Yea I lied. Here's more. All 13/10 https://t.co/ZQZf2U4xCP
18821310Ok last one of these. I may try to make some myself. Anyway here ya go. 13/10 https://t.co/i9CDd1oEu8
18851310I have found another. 13/10 https://t.co/HwroPYv8pY
18921210Just received another perfect photo of dogs and the sunset. 12/10 https://t.co/9YmNcxA2Cc
18951110Some clarification is required. The dog is singing Cher and that is more than worthy of an 11/10. Thank you
19051310The 13/10 also takes into account this impeccable yard. Louis is great but the future dad in me can't ignore that luscious green grass
1914131013/10\\n@ABC7
1940110The millennials have spoken and we've decided to immediately demote to a 1/10. Thank you
20361310I'm just going to leave this one here as well. 13/10 https://t.co/DaD5SyajWt
2038110After 22 minutes of careful deliberation this dog is being demoted to a 1/10. The longer you look at him the more terrifying he becomes
21491110After countless hours of research and hundreds of formula alterations we have concluded that Dug should be bumped to an 11/10
21691010This is Tessa. She is also very pleased after finally meeting her biological father. 10/10 https://t.co/qDS1aCqppv
2189121012/10 good shit Bubka\\n@wane15
22981010After much debate this dog is being upgraded to 10/10. I repeat 10/10
\n", "

78 rows × 3 columns

\n", "
" ], "text/plain": [ " rating_numerator rating_denominator \\\n", "30 12 10 \n", "55 17 10 \n", "64 14 10 \n", "113 10 10 \n", "148 12 10 \n", "149 14 10 \n", "179 12 10 \n", "184 14 10 \n", "186 14 10 \n", "188 420 10 \n", "189 666 10 \n", "218 13 10 \n", "228 11 10 \n", "234 13 10 \n", "251 13 10 \n", "274 10 10 \n", "290 182 10 \n", "291 15 10 \n", "313 960 0 \n", "342 11 15 \n", "346 12 10 \n", "387 7 10 \n", "409 13 10 \n", "427 13 10 \n", "498 12 10 \n", "513 11 10 \n", "565 11 10 \n", "570 11 10 \n", "576 11 10 \n", "611 11 10 \n", "... .. .. \n", "1479 11 10 \n", "1497 9 10 \n", "1501 13 10 \n", "1523 12 10 \n", "1598 4 20 \n", "1605 14 10 \n", "1618 5 10 \n", "1630 12 10 \n", "1634 143 130 \n", "1663 20 16 \n", "1689 5 10 \n", "1774 13 10 \n", "1819 7 10 \n", "1842 11 10 \n", "1844 9 10 \n", "1852 11 10 \n", "1866 13 10 \n", "1882 13 10 \n", "1885 13 10 \n", "1892 12 10 \n", "1895 11 10 \n", "1905 13 10 \n", "1914 13 10 \n", "1940 1 10 \n", "2036 13 10 \n", "2038 1 10 \n", "2149 11 10 \n", "2169 10 10 \n", "2189 12 10 \n", "2298 10 10 \n", "\n", " text \n", "30 @NonWhiteHat @MayhewMayhem omg hello tanner you are a scary good boy 12/10 would pet with extreme caution \n", "55 @roushfenway These are good dogs but 17/10 is an emotional impulse rating. More like 13/10s \n", "64 @RealKentMurphy 14/10 confirmed \n", "113 @ComplicitOwl @ShopWeRateDogs >10/10 is reserved for dogs \n", "148 @Jack_Septic_Eye I'd need a few more pics to polish a full analysis, but based on the good boy content above I'm leaning towards 12/10 \n", "149 Ladies and gentlemen... I found Pipsy. He may have changed his name to Pablo, but he never changed his love for the sea. Pupgraded to 14/10 https://t.co/lVU5GyNFen \n", "179 @Marc_IRL pixelated af 12/10 \n", "184 THIS IS CHARLIE, MARK. HE DID JUST WANT TO SAY HI AFTER ALL. PUPGRADED TO A 14/10. WOULD BE AN HONOR TO FLY WITH https://t.co/p1hBHCmWnA \n", "186 @xianmcguire @Jenna_Marbles Kardashians wouldn't be famous if as a society we didn't place enormous value on what they do. The dogs are very deserving of their 14/10 \n", "188 @dhmontgomery We also gave snoop dogg a 420/10 but I think that predated your research \n", "189 @s8n You tried very hard to portray this good boy as not so good, but you have ultimately failed. His goodness shines through. 666/10 \n", "218 @markhoppus MARK THAT DOG HAS SEEN AND EXPERIENCED MANY THINGS. PROBABLY LOST OTHER EAR DOING SOMETHING HEROIC. 13/10 HUG THE DOG HOPPUS \n", "228 Jerry just apuppologized to me. He said there was no ill-intent to the slippage. I overreacted I admit. Pupgraded to an 11/10 would pet \n", "234 .@breaannanicolee PUPDATE: Cannon has a heart on his nose. Pupgraded to a 13/10 \n", "251 PUPDATE: I'm proud to announce that Toby is 236 days sober. Pupgraded to a 13/10. We're all very proud of you, Toby https://t.co/a5OaJeRl9B \n", "274 @0_kelvin_0 >10/10 is reserved for puppos sorry Kevin \n", "290 @markhoppus 182/10 \n", "291 @bragg6of8 @Andy_Pace_ we are still looking for the first 15/10 \n", "313 @jonnysun @Lin_Manuel ok jomny I know you're excited but 960/00 isn't a valid rating, 13/10 is tho \n", "342 @docmisterio account started on 11/15/15 \n", "346 @UNC can confirm 12/10 \n", "387 I was going to do 007/10, but the joke wasn't worth the <10 rating \n", "409 @HistoryInPics 13/10 \n", "427 @imgur for a polar bear tho I'd say 13/10 is appropriate \n", "498 I've been informed by multiple sources that this is actually a dog elf who's tired from helping Santa all night. Pupgraded to 12/10 \n", "513 PUPDATE: I've been informed that Augie was actually bringing his family these flowers when he tripped. Very good boy. Pupgraded to 11/10 \n", "565 Like doggo, like pupper version 2. Both 11/10 https://t.co/9IxWAXFqze \n", "570 .@NBCSports OMG THE TINY HAT I'M GOING TO HAVE TO SAY 11/10 NBC \n", "576 @SkyWilliams doggo simply protecting you from evil that which you cannot see. 11/10 would give extra pets \n", "611 @JODYHiGHROLLER it may be an 11/10 but what do I know 😉 \n", "... ... \n", "1479 Personally I'd give him an 11/10. Not sure why you think you're qualified to rate such a stellar pup.\\n@CommonWhiteGirI \n", "1497 PUPDATE: just noticed this dog has some extra legs. Very advanced. Revolutionary af. Upgraded to a 9/10 \n", "1501 These are some pictures of Teddy that further justify his 13/10 rating. Please enjoy https://t.co/tDkJAnQsbQ \n", "1523 12/10 @LightningHoltt \n", "1598 Yes I do realize a rating of 4/20 would've been fitting. However, it would be unjust to give these cooperative pups that low of a rating \n", "1605 Jack deserves another round of applause. If you missed this earlier today I strongly suggest reading it. Wonderful first 14/10 🐶❤️ \n", "1618 For those who claim this is a goat, u are wrong. It is not the Greatest Of All Time. The rating of 5/10 should have made that clear. Thank u \n", "1630 After watching this video, we've determined that Pippa will be upgraded to a 12/10. Please enjoy https://t.co/IKoRK4yoxV \n", "1634 Two sneaky puppers were not initially seen, moving the rating to 143/130. Please forgive us. Thank you https://t.co/kRK51Y5ac3 \n", "1663 I'm aware that I could've said 20/16, but here at WeRateDogs we are very professional. An inconsistent rating scale is simply irresponsible \n", "1689 I've been told there's a slight possibility he's checking his mirror. We'll bump to 9.5/10. Still a menace \n", "1774 After getting lost in Reese's eyes for several minutes we're going to upgrade him to a 13/10 \n", "1819 After some outrage from the crowd. Bubbles is being upgraded to a 7/10. That's as high as I'm going. Thank you \n", "1842 & this is Yoshi. Another world record contender 11/10 (what the hell is happening why are there so many contenders?) https://t.co/QG708dDNH6 \n", "1844 This dog is being demoted to a 9/10 for not wearing a helmet while riding. Gotta stay safe out there. Thank you \n", "1852 We've got ourselves a battle here. Watch out Reggie. 11/10 https://t.co/ALJvbtcwf0 \n", "1866 Yea I lied. Here's more. All 13/10 https://t.co/ZQZf2U4xCP \n", "1882 Ok last one of these. I may try to make some myself. Anyway here ya go. 13/10 https://t.co/i9CDd1oEu8 \n", "1885 I have found another. 13/10 https://t.co/HwroPYv8pY \n", "1892 Just received another perfect photo of dogs and the sunset. 12/10 https://t.co/9YmNcxA2Cc \n", "1895 Some clarification is required. The dog is singing Cher and that is more than worthy of an 11/10. Thank you \n", "1905 The 13/10 also takes into account this impeccable yard. Louis is great but the future dad in me can't ignore that luscious green grass \n", "1914 13/10\\n@ABC7 \n", "1940 The millennials have spoken and we've decided to immediately demote to a 1/10. Thank you \n", "2036 I'm just going to leave this one here as well. 13/10 https://t.co/DaD5SyajWt \n", "2038 After 22 minutes of careful deliberation this dog is being demoted to a 1/10. The longer you look at him the more terrifying he becomes \n", "2149 After countless hours of research and hundreds of formula alterations we have concluded that Dug should be bumped to an 11/10 \n", "2169 This is Tessa. She is also very pleased after finally meeting her biological father. 10/10 https://t.co/qDS1aCqppv \n", "2189 12/10 good shit Bubka\\n@wane15 \n", "2298 After much debate this dog is being upgraded to 10/10. I repeat 10/10 \n", "\n", "[78 rows x 3 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.set_option('display.max_colwidth', -1)\n", "\n", "twitter_archive[twitter_archive.in_reply_to_status_id.notnull()][[\"rating_numerator\", \"rating_denominator\", \"text\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As for replies, they sometimes lack images, sometimes contain additional information being a comment to an original @dog_rates tweet, sometimes are not about dogs. For consistency of information, it may be useful to exclude replies together with retweets." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 2259\n", "doggo 97 \n", "Name: doggo, dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.doggo.value_counts()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 2099\n", "pupper 257 \n", "Name: pupper, dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.pupper.value_counts()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 2326\n", "puppo 30 \n", "Name: puppo, dtype: int64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.puppo.value_counts()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "None 2346\n", "floofer 10 \n", "Name: floofer, dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.floofer.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen from the output above, the missing values are encoded with \"None\" in string format. Also `pupper`, `puppo` and `doggo` columns may be combined in one `dog_stages` column and used as a ordinal categorical variable with three levels." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Names 2247\n", "a 55 \n", "the 8 \n", "an 7 \n", "very 5 \n", "quite 4 \n", "just 4 \n", "one 4 \n", "actually 2 \n", "getting 2 \n", "mad 2 \n", "not 2 \n", "officially 1 \n", "my 1 \n", "this 1 \n", "old 1 \n", "infuriating 1 \n", "his 1 \n", "such 1 \n", "incredibly 1 \n", "unacceptable 1 \n", "space 1 \n", "all 1 \n", "life 1 \n", "by 1 \n", "light 1 \n", "dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive[twitter_archive.name.notnull()].apply(lambda x: x['name'] \n", " if x['name'][0].islower() else \"Names\", \n", " axis = 1).value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the `name` column there are many non-name words extracted from text and should be excluded. Still, some tweets may also contain names in text, but not where it was expected. This will require further investigation during data cleaning." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Quality Issues In Twitter Archive Data\n", "\n", "1. 78 tweets are replies and can't be counted as tweets of \"standart format\" with an image, a text presenting the dog in the image and a rating number, as they often lack some of this informantion. \n", "2. 181 tweets have non-null values in `retweeted_status_id` and `retweeted_status_user_id`, which means that these tweets are actually retweets, and this doesn't follow the project guidelines. \n", "3. Missing values in dog stage columns, `floofer` and `name` column encoded with \"None\" strings, and not `pandas` `NaN` values. \n", "4. Values in `timestamp` and `retweeted_status_timestamp` columns not in datetime format. \n", "5. Values in `in_reply_to_status_id`, `in_reply_to_user_id`, `retweeted_status_id` and `retweeted_status_user_id` columns in float format and scientific notation, but in case of removing retweets and replies, this won't need any additional actions for these columns together with `retweeted_status_timestamp` will be columns with null values only and can be dropped. \n", "6. Some rating numerators are too large for \"M/10\" pattern - it is ok, when M is larger than 10 by some points, but not in times. Some are unexpectedly low. \n", "7. Some rating denominators isn't equal to 10. \n", "8. `Name` column contain articles and other \"non-name\" words.\n", "9. Values in `source` column are links with HTML wrapped around the actual content, which doesn't improve readability. Also, the type of the column should be category. \n", "10. Duplicated URLs in the same cells of `expanded_urls` column. \n", "\n", "#### Tidiness Issues In Twitter Archive Data\n", "\n", "1. Dog stages columns - `pupper`, `puppo` and `doggo` - may be combined in one as the levels of one categorical variable. Still, dual values for many dogs in a picture may occur. \n", "2. Rating columns - `rating_numerator` and `rating_denominator` - should be used to calculate one rating value in float format to be used in analysis. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Image Prediction Data" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2075, 12)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_predictions.shape" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2075 entries, 0 to 2074\n", "Data columns (total 12 columns):\n", "tweet_id 2075 non-null int64\n", "jpg_url 2075 non-null object\n", "img_num 2075 non-null int64\n", "p1 2075 non-null object\n", "p1_conf 2075 non-null float64\n", "p1_dog 2075 non-null bool\n", "p2 2075 non-null object\n", "p2_conf 2075 non-null float64\n", "p2_dog 2075 non-null bool\n", "p3 2075 non-null object\n", "p3_conf 2075 non-null float64\n", "p3_dog 2075 non-null bool\n", "dtypes: bool(3), float64(3), int64(2), object(4)\n", "memory usage: 152.1+ KB\n" ] } ], "source": [ "image_predictions.info()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urlimg_nump1p1_confp1_dogp2p2_confp2_dogp3p3_confp3_dog
0666020888022790149https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg1Welsh_springer_spaniel0.465074Truecollie0.156665TrueShetland_sheepdog0.061428True
1666029285002620928https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg1redbone0.506826Trueminiature_pinscher0.074192TrueRhodesian_ridgeback0.072010True
2666033412701032449https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg1German_shepherd0.596461Truemalinois0.138584Truebloodhound0.116197True
3666044226329800704https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg1Rhodesian_ridgeback0.408143Trueredbone0.360687Trueminiature_pinscher0.222752True
4666049248165822465https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg1miniature_pinscher0.560311TrueRottweiler0.243682TrueDoberman0.154629True
\n", "
" ], "text/plain": [ " tweet_id jpg_url \\\n", "0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg \n", "1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg \n", "2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg \n", "3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg \n", "4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg \n", "\n", " img_num p1 p1_conf p1_dog p2 \\\n", "0 1 Welsh_springer_spaniel 0.465074 True collie \n", "1 1 redbone 0.506826 True miniature_pinscher \n", "2 1 German_shepherd 0.596461 True malinois \n", "3 1 Rhodesian_ridgeback 0.408143 True redbone \n", "4 1 miniature_pinscher 0.560311 True Rottweiler \n", "\n", " p2_conf p2_dog p3 p3_conf p3_dog \n", "0 0.156665 True Shetland_sheepdog 0.061428 True \n", "1 0.074192 True Rhodesian_ridgeback 0.072010 True \n", "2 0.138584 True bloodhound 0.116197 True \n", "3 0.360687 True miniature_pinscher 0.222752 True \n", "4 0.243682 True Doberman 0.154629 True " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_predictions.head()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urlimg_nump1p1_confp1_dogp2p2_confp2_dogp3p3_confp3_dog
2070891327558926688256https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg2basset0.555712TrueEnglish_springer0.225770TrueGerman_short-haired_pointer0.175219True
2071891689557279858688https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg1paper_towel0.170278FalseLabrador_retriever0.168086Truespatula0.040836False
2072891815181378084864https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg1Chihuahua0.716012Truemalamute0.078253Truekelpie0.031379True
2073892177421306343426https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg1Chihuahua0.323581TruePekinese0.090647Truepapillon0.068957True
2074892420643555336193https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg1orange0.097049Falsebagel0.085851Falsebanana0.076110False
\n", "
" ], "text/plain": [ " tweet_id jpg_url \\\n", "2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg \n", "2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg \n", "2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg \n", "2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg \n", "2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg \n", "\n", " img_num p1 p1_conf p1_dog p2 p2_conf \\\n", "2070 2 basset 0.555712 True English_springer 0.225770 \n", "2071 1 paper_towel 0.170278 False Labrador_retriever 0.168086 \n", "2072 1 Chihuahua 0.716012 True malamute 0.078253 \n", "2073 1 Chihuahua 0.323581 True Pekinese 0.090647 \n", "2074 1 orange 0.097049 False bagel 0.085851 \n", "\n", " p2_dog p3 p3_conf p3_dog \n", "2070 True German_short-haired_pointer 0.175219 True \n", "2071 True spatula 0.040836 False \n", "2072 True kelpie 0.031379 True \n", "2073 True papillon 0.068957 True \n", "2074 False banana 0.076110 False " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_predictions.tail()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idjpg_urlimg_nump1p1_confp1_dogp2p2_confp2_dogp3p3_confp3_dog
656682259524040966145https://pbs.twimg.com/media/CXffar9WYAArfpw.jpg1Siberian_husky0.439670TrueEskimo_dog0.340474Truemalamute0.101253True
916701545186879471618https://pbs.twimg.com/media/CbxjnyOWAAAWLUH.jpg1Border_collie0.280893TrueCardigan0.112550Truetoy_terrier0.053317True
1034711732680602345472https://pbs.twimg.com/media/CeCVGEbUYAASeY4.jpg3dingo0.366875FalseIbizan_hound0.334929TrueEskimo_dog0.073876True
344672267570918129665https://pbs.twimg.com/media/CVRfyZxWUAAFIQR.jpg1Irish_terrier0.716932Trueminiature_pinscher0.051234TrueAiredale0.044381True
782690005060500217858https://pbs.twimg.com/media/CZNj8N-WQAMXASZ.jpg1Samoyed0.270287TrueGreat_Pyrenees0.114027Trueteddy0.072475False
1735821765923262631936https://pbs.twimg.com/media/C2d_vnHWEAE9phX.jpg1golden_retriever0.980071TrueLabrador_retriever0.008758TrueSaluki0.001806True
698684567543613382656https://pbs.twimg.com/media/CYASi6FWQAEQMW2.jpg1minibus0.401942Falsellama0.229145Falseseat_belt0.209393False
1245747512671126323200https://pbs.twimg.com/media/Cl-yykwWkAAqUCE.jpg1Cardigan0.111493Truemalinois0.095089TrueGerman_shepherd0.080146True
1737821886076407029760https://pbs.twimg.com/media/C2ftAxnWIAEUdAR.jpg1golden_retriever0.266238Truecocker_spaniel0.223325TrueIrish_setter0.151631True
1125727314416056803329https://pbs.twimg.com/media/Chfwmd9U4AQTf1b.jpg2toy_poodle0.827469Trueminiature_poodle0.160760TrueTibetan_terrier0.001731True
\n", "
" ], "text/plain": [ " tweet_id jpg_url \\\n", "656 682259524040966145 https://pbs.twimg.com/media/CXffar9WYAArfpw.jpg \n", "916 701545186879471618 https://pbs.twimg.com/media/CbxjnyOWAAAWLUH.jpg \n", "1034 711732680602345472 https://pbs.twimg.com/media/CeCVGEbUYAASeY4.jpg \n", "344 672267570918129665 https://pbs.twimg.com/media/CVRfyZxWUAAFIQR.jpg \n", "782 690005060500217858 https://pbs.twimg.com/media/CZNj8N-WQAMXASZ.jpg \n", "1735 821765923262631936 https://pbs.twimg.com/media/C2d_vnHWEAE9phX.jpg \n", "698 684567543613382656 https://pbs.twimg.com/media/CYASi6FWQAEQMW2.jpg \n", "1245 747512671126323200 https://pbs.twimg.com/media/Cl-yykwWkAAqUCE.jpg \n", "1737 821886076407029760 https://pbs.twimg.com/media/C2ftAxnWIAEUdAR.jpg \n", "1125 727314416056803329 https://pbs.twimg.com/media/Chfwmd9U4AQTf1b.jpg \n", "\n", " img_num p1 p1_conf p1_dog p2 \\\n", "656 1 Siberian_husky 0.439670 True Eskimo_dog \n", "916 1 Border_collie 0.280893 True Cardigan \n", "1034 3 dingo 0.366875 False Ibizan_hound \n", "344 1 Irish_terrier 0.716932 True miniature_pinscher \n", "782 1 Samoyed 0.270287 True Great_Pyrenees \n", "1735 1 golden_retriever 0.980071 True Labrador_retriever \n", "698 1 minibus 0.401942 False llama \n", "1245 1 Cardigan 0.111493 True malinois \n", "1737 1 golden_retriever 0.266238 True cocker_spaniel \n", "1125 2 toy_poodle 0.827469 True miniature_poodle \n", "\n", " p2_conf p2_dog p3 p3_conf p3_dog \n", "656 0.340474 True malamute 0.101253 True \n", "916 0.112550 True toy_terrier 0.053317 True \n", "1034 0.334929 True Eskimo_dog 0.073876 True \n", "344 0.051234 True Airedale 0.044381 True \n", "782 0.114027 True teddy 0.072475 False \n", "1735 0.008758 True Saluki 0.001806 True \n", "698 0.229145 False seat_belt 0.209393 False \n", "1245 0.095089 True German_shepherd 0.080146 True \n", "1737 0.223325 True Irish_setter 0.151631 True \n", "1125 0.160760 True Tibetan_terrier 0.001731 True " ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_predictions.sample(10)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "281" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.shape[0] - image_predictions.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Quality Issues in Image Prediction Data\n", "\n", "- Missing information for 281 tweets in Twitter Archive Data.\n", "- Underscores in predictions may be changed to spaces for readability. \n", "- Predictions may be changed to category type. \n", "\n", "In some cases it may by reasonable also to combine the predictions into three columns: \n", " >Number Of Prediction | Dog Breed | Confidence \n", " \n", "but for the purpose of this project where such changes may lead to many rows with the same tweets, it seems unreasonable. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Additional Twitter Data" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2340 entries, 0 to 2339\n", "Data columns (total 3 columns):\n", "tweet_id 2340 non-null int64\n", "favorite_count 2340 non-null int64\n", "retweet_count 2340 non-null int64\n", "dtypes: int64(3)\n", "memory usage: 54.9 KB\n" ] } ], "source": [ "tweet_jsons.info()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idfavorite_countretweet_count
0892420643555336193378558260
1892177421306343426325266104
2891815181378084864244924041
3891689557279858688412058410
4891327558926688256393849109
5891087950875897856198003027
6890971913173991426115722001
78907291814112378886387318344
8890609185150312448272124160
9890240255349198849312037176
10890006608113172480299867127
11889880896479866881271934837
12889665388333682689470389761
13889638837579907072265304400
14889531135344209921147772187
15889278841981685760246685214
16888917238123831296284744379
17888804989199671297249954165
18888554962724278272193853443
19888078434458587136212703395
\n", "
" ], "text/plain": [ " tweet_id favorite_count retweet_count\n", "0 892420643555336193 37855 8260 \n", "1 892177421306343426 32526 6104 \n", "2 891815181378084864 24492 4041 \n", "3 891689557279858688 41205 8410 \n", "4 891327558926688256 39384 9109 \n", "5 891087950875897856 19800 3027 \n", "6 890971913173991426 11572 2001 \n", "7 890729181411237888 63873 18344 \n", "8 890609185150312448 27212 4160 \n", "9 890240255349198849 31203 7176 \n", "10 890006608113172480 29986 7127 \n", "11 889880896479866881 27193 4837 \n", "12 889665388333682689 47038 9761 \n", "13 889638837579907072 26530 4400 \n", "14 889531135344209921 14777 2187 \n", "15 889278841981685760 24668 5214 \n", "16 888917238123831296 28474 4379 \n", "17 888804989199671297 24995 4165 \n", "18 888554962724278272 19385 3443 \n", "19 888078434458587136 21270 3395 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_jsons.head(20)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter_archive.shape[0] - tweet_jsons.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Quality Issues in Additional Twitter Data\n", "\n", "- Missing information for 16 tweets in Twitter Archive Data: Twitter returned \"No status found with that ID\" message." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Data Cleaning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define\n", "\n", "The following steps need to be taken to clean and combine the data for further analysis. \n", "\n", "1. Identify and exclude the rows in `twitter_archive` dataframe that correspond to retweets and replies. \n", "2. Exclude `in_reply_to_status_id`, `in_reply_to_user_id`, `retweeted_status_id` and `retweeted_status_user_id` columns. \n", "3. Convert values in `timestamp` column to datetime format. \n", "4. Clean HTML information in `source` column and convert it to category type. \n", "5. Remove duplicated URLs from `expanded_urls` column. \n", "6. Replace `None` values in `dog_stages` and `name` with `pandas` `NaN` values. \n", "7. Check if any names can be extracted from tweets with non-name words in `name` column and add the proper names, if any.\n", "8. Replace other \"non-name\" values in `name` column with `NaN` values.\n", "9. Combine `pupper`, `puppo` and `doggo` columns in one `dog_stages` column. \n", "10. Check `dog_stages` for correctness. \n", "11. Explore the rating numerators and denominators to define if the ratings can be corrected or should be excluded. \n", "12. Combine the cleaned `rating_numerator` and `rating_denominator` columns in one `rating` column in float format. \n", "\n", "13. Join `twitter_archive` dataframe with `image_predictions` and `tweet_jsons` dataframe on `tweet_id`/`id` columns, removing the rows which tweet IDs are not present in all three dataframes. \n", "\n", "---\n", "\n", "### Code & Test" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# copying the data for cleaning\n", "archive_clean = twitter_archive.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since no modification intended of the other other dataframes and the assinging the merged dataframes to the one copied above won't affect them, there is no need tomake duplicated of them in memory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Identify and exclude the rows in twitter archive dataframe that correspond to retweets and replies. " ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2097, 17)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.in_reply_to_status_id.isnull() & archive_clean.retweeted_status_id.isnull()\n", "\n", "archive_clean = archive_clean.loc[mask, ]\n", "archive_clean.shape" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 2097 entries, 0 to 2355\n", "Data columns (total 17 columns):\n", "tweet_id 2097 non-null int64\n", "in_reply_to_status_id 0 non-null float64\n", "in_reply_to_user_id 0 non-null float64\n", "timestamp 2097 non-null object\n", "source 2097 non-null object\n", "text 2097 non-null object\n", "retweeted_status_id 0 non-null float64\n", "retweeted_status_user_id 0 non-null float64\n", "retweeted_status_timestamp 0 non-null object\n", "expanded_urls 2094 non-null object\n", "rating_numerator 2097 non-null int64\n", "rating_denominator 2097 non-null int64\n", "name 2097 non-null object\n", "doggo 2097 non-null object\n", "floofer 2097 non-null object\n", "pupper 2097 non-null object\n", "puppo 2097 non-null object\n", "dtypes: float64(4), int64(3), object(10)\n", "memory usage: 294.9+ KB\n" ] } ], "source": [ "# test\n", "archive_clean.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Exclude 'in_reply_to_status_id', 'in_reply_to_user_id', 'retweeted_status_id', 'retweeted_status_user_id' and 'retweeted_status_timestamp' columns." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2097, 12)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean = archive_clean.dropna(axis = 1, how = 'all')\n", "archive_clean.shape" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 2097 entries, 0 to 2355\n", "Data columns (total 12 columns):\n", "tweet_id 2097 non-null int64\n", "timestamp 2097 non-null object\n", "source 2097 non-null object\n", "text 2097 non-null object\n", "expanded_urls 2094 non-null object\n", "rating_numerator 2097 non-null int64\n", "rating_denominator 2097 non-null int64\n", "name 2097 non-null object\n", "doggo 2097 non-null object\n", "floofer 2097 non-null object\n", "pupper 2097 non-null object\n", "puppo 2097 non-null object\n", "dtypes: int64(3), object(9)\n", "memory usage: 213.0+ KB\n" ] } ], "source": [ "# test\n", "archive_clean.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Convert values in 'timestamp' column to datetime format. " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "archive_clean.timestamp = pd.to_datetime(archive_clean.timestamp)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "# test\n", "\n", "assert archive_clean.timestamp.dtype == 'datetime64[ns]'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Clean HTML information in 'source' column and convert it to category type." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1070 Twitter for iPhone\n", "1716 Twitter for iPhone\n", "1087 Twitter for iPhone\n", "Name: source, dtype: object" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.source = archive_clean.source.replace(r'^', '', regex = True)\n", "archive_clean.source = archive_clean.source.replace('', '', regex = True)\n", "archive_clean.source.sample(3) " ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Twitter for iPhone 1964\n", "Vine - Make a Scene 91 \n", "Twitter Web Client 31 \n", "TweetDeck 11 \n", "Name: source, dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.source.value_counts()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "archive_clean.source = archive_clean.source.astype('category')" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "# test\n", "\n", "assert archive_clean.source.dtype == 'category'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Remove duplicated URLs from 'expanded_urls' column" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0 https://twitter.com/dog_rates/status/892420643555336193/photo/1 \n", "1 https://twitter.com/dog_rates/status/892177421306343426/photo/1 \n", "2 https://twitter.com/dog_rates/status/891815181378084864/photo/1 \n", "3 https://twitter.com/dog_rates/status/891689557279858688/photo/1 \n", "4 https://twitter.com/dog_rates/status/891327558926688256/photo/1,https://twitter.com/dog_rates/status/891327558926688256/photo/1\n", "Name: expanded_urls, dtype: object" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean[archive_clean.expanded_urls.notnull()].expanded_urls.head()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "archive_clean.expanded_urls = archive_clean.apply(lambda x: \n", " ', '.join(set(x['expanded_urls'].split(','))) \n", " if pd.notnull(x['expanded_urls']) else x['expanded_urls'], \n", " axis = 1)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 https://twitter.com/dog_rates/status/892420643555336193/photo/1\n", "1 https://twitter.com/dog_rates/status/892177421306343426/photo/1\n", "2 https://twitter.com/dog_rates/status/891815181378084864/photo/1\n", "3 https://twitter.com/dog_rates/status/891689557279858688/photo/1\n", "4 https://twitter.com/dog_rates/status/891327558926688256/photo/1\n", "Name: expanded_urls, dtype: object" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# test\n", "\n", "archive_clean[archive_clean.expanded_urls.notnull()].expanded_urls.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Replace \"None\" values in dog stages columns and 'name' columns with NaN values." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "archive_clean.iloc[: , -5:] = archive_clean.iloc[: , -5:].replace('None', np.nan)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 2097 entries, 0 to 2355\n", "Data columns (total 12 columns):\n", "tweet_id 2097 non-null int64\n", "timestamp 2097 non-null datetime64[ns]\n", "source 2097 non-null category\n", "text 2097 non-null object\n", "expanded_urls 2094 non-null object\n", "rating_numerator 2097 non-null int64\n", "rating_denominator 2097 non-null int64\n", "name 1494 non-null object\n", "doggo 83 non-null object\n", "floofer 10 non-null object\n", "pupper 230 non-null object\n", "puppo 24 non-null object\n", "dtypes: category(1), datetime64[ns](1), int64(3), object(7)\n", "memory usage: 198.8+ KB\n" ] } ], "source": [ "# test\n", "\n", "archive_clean.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Check if any names can be extracted from tweets with non-name words in 'name' column and add the proper names, if any.\n", "\n", "8. Replace other non-name words in 'name' column with NaN." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Names 1390\n", "a 55 \n", "the 8 \n", "an 6 \n", "very 4 \n", "one 4 \n", "quite 3 \n", "just 3 \n", "actually 2 \n", "not 2 \n", "getting 2 \n", "my 1 \n", "officially 1 \n", "old 1 \n", "infuriating 1 \n", "light 1 \n", "all 1 \n", "unacceptable 1 \n", "this 1 \n", "space 1 \n", "mad 1 \n", "life 1 \n", "by 1 \n", "such 1 \n", "his 1 \n", "incredibly 1 \n", "dtype: int64" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean[archive_clean.name.notnull()].apply(lambda x: x['name'] \n", " if x['name'][0].islower() else \"Names\", \n", " axis = 1).value_counts()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a, the, an, very, one, quite, just, actually, not, getting, my, officially, old, infuriating, light, all, unacceptable, this, space, mad, life, by, such, his, incredibly'" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "not_names = (archive_clean[archive_clean.name.notnull()].apply(lambda x: x['name'] \n", " if x['name'][0].islower() else \"Names\", \n", " axis = 1).value_counts() < 60).index.tolist()[1:]\n", "\", \".join(not_names)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "22 I've yet to rate a Venezuelan Hover Wiener. This is such an honor. 14/10 paw-inspiring af (IG: roxy.thedoxy) https://t.co/20VrLAA8ba\n", "56 Here is a pupper approaching maximum borkdrive. Zooming at never before seen speeds. 14/10 paw-inspiring af \n", "(IG: puffie_the_chow) https://t.co/ghXBIIeQZF\n", "169 We only rate dogs. This is quite clearly a smol broken polar bear. We'd appreciate if you only send dogs. Thank you... 12/10 https://t.co/g2nSyGenG9\n", "193 Guys, we only rate dogs. This is quite clearly a bulbasaur. Please only send dogs. Thank you... 12/10 human used pet, it's super effective https://t.co/Xc7uj1C64x\n", "335 There's going to be a dog terminal at JFK Airport. This is not a drill. 10/10 \n", "https://t.co/dp5h9bCwU7\n", "369 Occasionally, we're sent fantastic stories. This is one of them. 14/10 for Grace https://t.co/bZ4axuH6OK\n", "542 We only rate dogs. Please stop sending in non-canines like this Freudian Poof Lion. This is incredibly frustrating... 11/10 https://t.co/IZidSrBvhi\n", "649 Here is a perfect example of someone who has their priorities in order. 13/10 for both owner and Forrest https://t.co/LRyMrU7Wfq\n", "801 Guys this is getting so out of hand. We only rate dogs. This is a Galapagos Speed Panda. Pls only send dogs... 10/10 https://t.co/8lpAGaZRFn\n", "819 We only rate dogs. Pls stop sending in non-canines like this Arctic Floof Kangaroo. This is very frustrating. 11/10 https://t.co/qlUDuPoE3d\n", "852 This is my dog. Her name is Zoey. She knows I've been rating other dogs. She's not happy. 13/10 no bias at all https://t.co/ep1NkYoiwB\n", "924 This is one of the most inspirational stories I've ever come across. I have no words. 14/10 for both doggo and owner https://t.co/I5ld3eKD5k\n", "988 What jokester sent in a pic without a dog in it? This is not @rock_rates. This is @dog_rates. Thank you ...10/10 https://t.co/nDPaYHrtNX\n", "992 That is Quizno. This is his beach. He does not tolerate human shenanigans on his beach. 10/10 reclaim ur land doggo https://t.co/vdr7DaRSa7\n", "993 This is one of the most reckless puppers I've ever seen. How she got a license in the first place is beyond me. 6/10 https://t.co/z5bAdtn9kd\n", "1002 This is a mighty rare blue-tailed hammer sherk. Human almost lost a limb trying to take these. Be careful guys. 8/10 https://t.co/TGenMeXreW\n", "1004 Viewer discretion is advised. This is a terrible attack in progress. Not even in water (tragic af). 4/10 bad sherk https://t.co/L3U0j14N5R\n", "1017 This is a carrot. We only rate dogs. Please only send in dogs. You all really should know this by now ...11/10 https://t.co/9e48aPrBm2\n", "1025 This is an Iraqi Speed Kangaroo. It is not a dog. Please only send in dogs. I'm very angry with all of you ...9/10 https://t.co/5qpBTTpgUt\n", "1031 We only rate dogs. Pls stop sending in non-canines like this Jamaican Flop Seal. This is very very frustrating. 9/10 https://t.co/nc53zEN0hZ\n", "1040 This is actually a pupper and I'd pet it so well. 12/10\n", "https://t.co/RNqS7C4Y4N\n", "1049 This is a very rare Great Alaskan Bush Pupper. Hard to stumble upon without spooking. 12/10 would pet passionately https://t.co/xOBKCdpzaa\n", "1063 This is just downright precious af. 12/10 for both pupper and doggo https://t.co/o5J479bZUC\n", "1071 This is getting incredibly frustrating. This is a Mexican Golden Beaver. We only rate dogs. Only send dogs ...10/10 https://t.co/0yolOOyD3X\n", "1095 Say hello to mad pupper. You know what you did. 13/10 would pet until no longer furustrated https://t.co/u1ulQ5heLX\n", "1097 We only rate dogs. Please stop sending in non-canines like this Alaskan Flop Turtle. This is very frustrating. 10/10 https://t.co/qXteK6Atxc\n", "1120 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv\n", "1121 We only rate dogs. Pls stop sending non-canines like this Bulgarian Eyeless Porch Bear. This is unacceptable... 9/10 https://t.co/2yctWAUZ3Z\n", "1138 This is all I want in my life. 12/10 for super sleepy pupper https://t.co/4RlLA5ObMh\n", "1193 People please. This is a Deadly Mediterranean Plop T-Rex. We only rate dogs. Only send in dogs. Thanks you... 11/10 https://t.co/2ATDsgHD4n\n", "1206 This is old now but it's absolutely heckin fantastic and I can't not share it with you all. 13/10 https://t.co/wJX74TSgzP\n", "1207 This is a taco. We only rate dogs. Please only send in dogs. Dogs are what we rate. Not tacos. Thank you... 10/10 https://t.co/cxl6xGY8B9\n", "1259 We 👏🏻 only 👏🏻 rate 👏🏻 dogs. Pls stop sending in non-canines like this Dutch Panda Worm. This is infuriating. 11/10 https://t.co/odfLzBonG2\n", "1340 Here is a heartbreaking scene of an incredible pupper being laid to rest. 10/10 RIP pupper https://t.co/81mvJ0rGRu\n", "1351 Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa\n", "1361 This is a Butternut Cumberfloof. It's not windy they just look like that. 11/10 back at it again with the red socks https://t.co/hMjzhdUHaW\n", "1362 This is an East African Chalupa Seal. We only rate dogs. Please only send in dogs. Thank you... 10/10 https://t.co/iHe6liLwWR\n", "1368 This is a Wild Tuscan Poofwiggle. Careful not to startle. Rare tongue slip. One eye magical. 12/10 would def pet https://t.co/4EnShAQjv6\n", "1382 \"Pupper is a present to world. Here is a bow for pupper.\" 12/10 precious as hell https://t.co/ItSsE92gCW\n", "1385 We only rate dogs. Pls stop sending in non-canines like this Mongolian grass snake. This is very frustrating. 11/10 https://t.co/22x9SbCYCU\n", "1435 Please stop sending in saber-toothed tigers. This is getting ridiculous. We only rate dogs.\n", "...8/10 https://t.co/iAeQNueou8\n", "1457 This is just a beautiful pupper good shit evolution. 12/10 https://t.co/2L8pI0Z2Ib\n", "1499 This is a rare Arctic Wubberfloof. Unamused by the happenings. No longer has the appetites. 12/10 would totally hug https://t.co/krvbacIX0N\n", "1527 Stop sending in lobsters. This is the final warning. We only rate dogs. Thank you... 9/10 https://t.co/B9ZXXKJYNx\n", "1603 This is the newly formed pupper a capella group. They're just starting out but I see tons of potential. 8/10 for all https://t.co/wbAcvFoNtn\n", "1693 This is actually a lion. We only rate dogs. For the last time please only send dogs. Thank u.\n", "12/10 would still pet https://t.co/Pp26dMQxap\n", "1724 This is by far the most coordinated series of pictures I was sent. Downright impressive in every way. 12/10 for all https://t.co/etzLo3sdZE\n", "1737 Guys this really needs to stop. We've been over this way too many times. This is a giraffe. We only rate dogs.. 7/10 https://t.co/yavgkHYPOC\n", "1747 This is officially the greatest yawn of all time. 12/10 https://t.co/4R0Cc0sLVE\n", "1785 This is a dog swinging. I really enjoyed it so I hope you all do as well. 11/10 https://t.co/Ozo9KHTRND\n", "1797 This is the happiest pupper I've ever seen. 10/10 would trade lives with https://t.co/ep8ATEJwRb\n", "1815 This is the saddest/sweetest/best picture I've been sent. 12/10 😢🐶 https://t.co/vQ2Lw1BLBF\n", "1853 This is a Sizzlin Menorah spaniel from Brooklyn named Wylie. Lovable eyes. Chiller as hell. 10/10 and I'm out.. poof https://t.co/7E0AiJXPmI\n", "1854 Seriously guys?! Only send in dogs. I only rate dogs. This is a baby black bear... 11/10 https://t.co/H7kpabTfLj\n", "1877 C'mon guys. We've been over this. We only rate dogs. This is a cow. Please only submit dogs. Thank you...... 9/10 https://t.co/WjcELNEqN2\n", "1878 This is a fluffy albino Bacardi Columbia mix. Excellent at the tweets. 11/10 would hug gently https://t.co/diboDRUuEI\n", "1916 This is life-changing. 12/10 https://t.co/SroTpI6psB\n", "1923 This is a Sagitariot Baklava mix. Loves her new hat. 11/10 radiant pup https://t.co/Bko5kFJYUU\n", "1936 This is one esteemed pupper. Just graduated college. 10/10 what a champ https://t.co/nyReCVRiyd\n", "1941 This is a heavily opinionated dog. Loves walls. Nobody knows how the hair works. Always ready for a kiss. 4/10 https://t.co/dFiaKZ9cDl\n", "1955 This is a Lofted Aphrodisiac Terrier named Kip. Big fan of bed n breakfasts. Fits perfectly. 10/10 would pet firmly https://t.co/gKlLpNzIl3\n", "1994 This is a baby Rand Paul. Curls for days. 11/10 would cuddle the hell out of https://t.co/xHXNaPAYRe\n", "2001 This is light saber pup. Ready to fight off evil with light saber. 10/10 true hero https://t.co/LPPa3btIIt\n", "2019 This is just impressive I have nothing else to say. 11/10 https://t.co/LquQZiZjJP\n", "2030 This is space pup. He's very confused. Tries to moonwalk at one point. Super spiffy uniform. 13/10 I love space pup https://t.co/SfPQ2KeLdq\n", "2034 This is a Tuscaloosa Alcatraz named Jacob (Yacōb). Loves to sit in swing. Stellar tongue. 11/10 look at his feet https://t.co/2IslQ8ZSc7\n", "2037 This is the best thing I've ever seen so spread it like wildfire & maybe we'll find the genius who created it. 13/10 https://t.co/q6RsuOVYwU\n", "2066 This is a Helvetica Listerine named Rufus. This time Rufus will be ready for the UPS guy. He'll never expect it 9/10 https://t.co/34OhVhMkVr\n", "2116 This is a Deciduous Trimester mix named Spork. Only 1 ear works. No seat belt. Incredibly reckless. 9/10 still cute https://t.co/CtuJoLHiDo\n", "2125 This is a Rich Mahogany Seltzer named Cherokee. Just got destroyed by a snowball. Isn't very happy about it. 9/10 https://t.co/98ZBi6o4dj\n", "2128 This is a Speckled Cauliflower Yosemite named Hemry. He's terrified of intruder dog. Not one bit comfortable. 9/10 https://t.co/yV3Qgjh8iN\n", "2146 This is a spotted Lipitor Rumpelstiltskin named Alphred. He can't wait for the Turkey. 10/10 would pet really well https://t.co/6GUGO7azNX\n", "2153 This is a brave dog. Excellent free climber. Trying to get closer to God. Not very loyal though. Doesn't bark. 5/10 https://t.co/ODnILTr4QM\n", "2161 This is a Coriander Baton Rouge named Alfredo. Loves to cuddle with smaller well-dressed dog. 10/10 would hug lots https://t.co/eCRdwouKCl\n", "2191 This is a Slovakian Helter Skelter Feta named Leroi. Likes to skip on roofs. Good traction. Much balance. 10/10 wow! https://t.co/Dmy2mY2Qj5\n", "2198 This is a wild Toblerone from Papua New Guinea. Mouth always open. Addicted to hay. Acts blind. 7/10 handsome dog https://t.co/IGmVbz07tZ\n", "2204 This is an Irish Rigatoni terrier named Berta. Completely made of rope. No eyes. Quite large. Loves to dance. 10/10 https://t.co/EM5fDykrJg\n", "2211 Here is a horned dog. Much grace. Can jump over moons (dam!). Paws not soft. Bad at barking. 7/10 can still pet tho https://t.co/2Su7gmsnZm\n", "2212 Never forget this vine. You will not stop watching for at least 15 minutes. This is the second coveted.. 13/10 https://t.co/roqIxCvEB3\n", "2218 This is a Birmingham Quagmire named Chuk. Loves to relax and watch the game while sippin on that iced mocha. 10/10 https://t.co/HvNg9JWxFt\n", "2222 Here is a mother dog caring for her pups. Snazzy red mohawk. Doesn't wag tail. Pups look confused. Overall 4/10 https://t.co/YOHe6lf09m\n", "2235 This is a Trans Siberian Kellogg named Alfonso. Huge ass eyeballs. Actually Dobby from Harry Potter. 7/10 https://t.co/XpseHBlAAb\n", "2249 This is a Shotokon Macadamia mix named Cheryl. Sophisticated af. Looks like a disappointed librarian. Shh (lol) 9/10 https://t.co/J4GnJ5Swba\n", "2255 This is a rare Hungarian Pinot named Jessiga. She is either mid-stroke or got stuck in the washing machine. 8/10 https://t.co/ZU0i0KJyqD\n", "2264 This is a southwest Coriander named Klint. Hat looks expensive. Still on house arrest :(\n", "9/10 https://t.co/IQTOMqDUIe\n", "2273 This is a northern Wahoo named Kohl. He runs this town. Chases tumbleweeds. Draws gun wicked fast. 11/10 legendary https://t.co/J4vn2rOYFk\n", "2287 This is a Dasani Kingfisher from Maine. His name is Daryl. Daryl doesn't like being swallowed by a panda. 8/10 https://t.co/jpaeu6LNmW\n", "2304 This is a curly Ticonderoga named Pepe. No feet. Loves to jet ski. 11/10 would hug until forever https://t.co/cyDfaK8NBc\n", "2311 This is a purebred Bacardi named Octaviath. Can shoot spaghetti out of mouth. 10/10 https://t.co/uEvsGLOFHa\n", "2314 This is a golden Buckminsterfullerene named Johm. Drives trucks. Lumberjack (?). Enjoys wall. 8/10 would hug softly https://t.co/uQbZJM2DQB\n", "2326 This is quite the dog. Gets really excited when not in water. Not very soft tho. Bad at fetch. Can't do tricks. 2/10 https://t.co/aMCTNWO94t\n", "2327 This is a southern Vesuvius bumblegruff. Can drive a truck (wow). Made friends with 5 other nifty dogs (neat). 7/10 https://t.co/LopTBkKa8h\n", "2333 This is an extremely rare horned Parthenon. Not amused. Wears shoes. Overall very nice. 9/10 would pet aggressively https://t.co/QpRjllzWAL\n", "2334 This is a funny dog. Weird toes. Won't come down. Loves branch. Refuses to eat his food. Hard to cuddle with. 3/10 https://t.co/IIXis0zta0\n", "2335 This is an Albanian 3 1/2 legged Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv\n", "2345 This is the happiest dog you will ever see. Very committed owner. Nice couch. 10/10 https://t.co/RhUEAloehK\n", "2346 Here is the Rand Paul of retrievers folks! He's probably good at poker. Can drink beer (lol rad). 8/10 good dog https://t.co/pYAJkAe76p\n", "2347 My oh my. This is a rare blond Canadian terrier on wheels. Only $8.98. Rather docile. 9/10 very rare https://t.co/yWBqbrzy8O\n", "2348 Here is a Siberian heavily armored polar bear mix. Strong owner. 10/10 I would do unspeakable things to pet this dog https://t.co/rdivxLiqEt\n", "2349 This is an odd dog. Hard on the outside but loving on the inside. Petting still fun. Doesn't play catch well. 2/10 https://t.co/v5A4vzSDdc\n", "2350 This is a truly beautiful English Wilson Staff retriever. Has a nice phone. Privileged. 10/10 would trade lives with https://t.co/fvIbQfHjIe\n", "2352 This is a purebred Piers Morgan. Loves to Netflix and chill. Always looks like he forgot to unplug the iron. 6/10 https://t.co/DWnyCjf2mx\n", "2353 Here is a very happy pup. Big fan of well-maintained decks. Just look at that tongue. 9/10 would cuddle af https://t.co/y671yMhoiR\n", "2354 This is a western brown Mitsubishi terrier. Upset about leaf. Actually 2 dogs here. 7/10 would walk the shit out of https://t.co/r7mOb2m0UI\n" ] } ], "source": [ "for index, row in archive_clean.iterrows():\n", " if row['name'] in not_names:\n", " print(index, row['text'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In some tweets, where \"non-name\" words were extracted, there are names present after words \"named\" or \"name is\". These names can be extacted and added to the `name` column. Other values should be replaced with `NaN`." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tilly nan Alfonso\n" ] } ], "source": [ "def get_name(x, text):\n", " \"\"\"\n", " Function for extracting dog names from text field of a tweet, \n", " if non-name word was extracted on previous iteration\n", " \"\"\"\n", " split_words = ['named ', 'name is ']\n", " \n", " if x is np.nan or x[0].isupper():\n", " return x\n", " else:\n", " split_word = \"\"\n", " if split_words[0] in text:\n", " split_word = split_words[0]\n", " elif split_words[1] in text:\n", " split_word = split_words[1]\n", " else:\n", " return np.nan\n", " \n", " if split_word:\n", " name = text.split(split_word)[1].split(' ')[0].replace('.', '')\n", " \n", " return name\n", "\n", "# Function test \n", "\n", "print(get_name(archive_clean.name[1], archive_clean.text[1]), # Name\n", " get_name(archive_clean.name[1878], archive_clean.text[1878]), # No name in text\n", " get_name(archive_clean.name[2235], archive_clean.text[2235])) # Article instead of name" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "archive_clean.name = archive_clean.apply(lambda x: get_name(x['name'], x['text']), \n", " axis = 1)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Lucy 11\n", "Charlie 11\n", "Cooper 10\n", "Oliver 10\n", "Tucker 9 \n", "Penny 9 \n", "Winston 8 \n", "Lola 8 \n", "Sadie 8 \n", "Toby 7 \n", "Daisy 7 \n", "Stanley 6 \n", "Oscar 6 \n", "Bo 6 \n", "Bailey 6 \n", "Jax 6 \n", "Koda 6 \n", "Bella 6 \n", "Leo 5 \n", "Scout 5 \n", "Chester 5 \n", "Buddy 5 \n", "Dave 5 \n", "Louis 5 \n", "Milo 5 \n", "Bentley 5 \n", "Rusty 5 \n", "Archie 4 \n", "Gus 4 \n", "Winnie 4 \n", " .. \n", "Chloe 1 \n", "Milky 1 \n", "Shaggy 1 \n", "Hercules 1 \n", "Darby 1 \n", "Skittle 1 \n", "Brady 1 \n", "Peanut 1 \n", "Flash 1 \n", "Harnold 1 \n", "Geoff 1 \n", "Ed 1 \n", "Vixen 1 \n", "Derby 1 \n", "Charleson 1 \n", "Rorie 1 \n", "Jerome 1 \n", "Rodney 1 \n", "Champ 1 \n", "Shiloh 1 \n", "Ebby 1 \n", "Kulet 1 \n", "Iggy 1 \n", "Marlee 1 \n", "Rooney 1 \n", "Covach 1 \n", "Blue 1 \n", "Obie 1 \n", "Burt 1 \n", "Edmund 1 \n", "Name: name, Length: 947, dtype: int64" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.name.value_counts()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "# test\n", "\n", "assert len(archive_clean[archive_clean.name.notnull()].apply(lambda x: x['name'] \n", " if x['name'][0].islower() else \"Names\", \n", " axis = 1).value_counts().index.tolist()) == 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. Combine 'pupper', 'puppo' and 'doggo' columns in one 'dog_stages' column." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pupperpuppodoggo
1589pupper
1691
2102
816
502
636
94puppo
400
1997
1094
\n", "
" ], "text/plain": [ " pupper puppo doggo\n", "1589 pupper \n", "1691 \n", "2102 \n", "816 \n", "502 \n", "636 \n", "94 puppo \n", "400 \n", "1997 \n", "1094 " ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean[['pupper', 'puppo', 'doggo']] = archive_clean[['pupper', 'puppo', 'doggo']].fillna('')\n", "archive_clean[['pupper', 'puppo', 'doggo']].sample(10)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "archive_clean['dog_stages'] = archive_clean.pupper.astype(str) + ',' + archive_clean.puppo +',' + archive_clean.doggo\n", "\n", "archive_clean.dog_stages = archive_clean.dog_stages.replace(\",,\", np.nan)\n", "archive_clean.iloc[: , -5:-1] = archive_clean.iloc[: , -5:-1].replace('', np.nan)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
doggoflooferpupperpuppodog_stages
542NaNNaNNaNNaNNaN
259NaNNaNNaNNaNNaN
29NaNNaNpupperNaNpupper,,
802NaNNaNpupperNaNpupper,,
412NaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " doggo floofer pupper puppo dog_stages\n", "542 NaN NaN NaN NaN NaN \n", "259 NaN NaN NaN NaN NaN \n", "29 NaN NaN pupper NaN pupper,, \n", "802 NaN NaN pupper NaN pupper,, \n", "412 NaN NaN NaN NaN NaN " ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.iloc[: , -5:].sample(5)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "archive_clean.dog_stages = archive_clean.dog_stages.str.strip(\",\").replace(',,', ',', regex = True)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 221\n", "doggo 73 \n", "puppo 23 \n", "pupper,doggo 9 \n", "puppo,doggo 1 \n", "Name: dog_stages, dtype: int64" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.dog_stages.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. Check 'dog_stages' for correctness." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "191 Here's a puppo participating in the #ScienceMarch. Cleverly disguising her own doggo agenda. 13/10 would keep the planet habitable for https://t.co/cMhq16isel\n", "Name: text, dtype: object" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.dog_stages == 'puppo,doggo'\n", "\n", "archive_clean[mask].text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen from the text, the stage sould be set to 'puppo'." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 221\n", "doggo 73 \n", "puppo 24 \n", "pupper,doggo 9 \n", "Name: dog_stages, dtype: int64" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.options.mode.chained_assignment = None\n", "\n", "archive_clean.dog_stages[191] = 'puppo'\n", "\n", "archive_clean.dog_stages.value_counts()" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "460 This is Dido. She's playing the lead role in \"Pupper Stops to Catch Snow Before Resuming Shadow Box with Dried Apple.\" 13/10 (IG: didodoggo) https://t.co/m7isZrOBX7\n", "531 Here we have Burke (pupper) and Dexter (doggo). Pupper wants to be exactly like doggo. Both 12/10 would pet at same time https://t.co/ANBpEYHaho \n", "575 This is Bones. He's being haunted by another doggo of roughly the same size. 12/10 deep breaths pupper everything's fine https://t.co/55Dqe0SJNj \n", "705 This is Pinot. He's a sophisticated doggo. You can tell by the hat. Also pointier than your average pupper. Still 10/10 would pet cautiously https://t.co/f2wmLZTPHd\n", "733 Pupper butt 1, Doggo 0. Both 12/10 https://t.co/WQvcPEpH2u \n", "889 Meet Maggie & Lila. Maggie is the doggo, Lila is the pupper. They are sisters. Both 12/10 would pet at the same time https://t.co/MYwR4DQKll \n", "956 Please stop sending it pictures that don't even have a doggo or pupper in them. Churlish af. 5/10 neat couch tho https://t.co/u2c9c7qSg8 \n", "1063 This is just downright precious af. 12/10 for both pupper and doggo https://t.co/o5J479bZUC \n", "1113 Like father (doggo), like son (pupper). Both 12/10 https://t.co/pG2inLaOda \n", "Name: text, dtype: object" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.dog_stages == 'pupper,doggo'\n", "\n", "archive_clean[mask].text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the indexes above: \n", "460 - no stage \n", "531 - two dogs \n", "575 - pupper \n", "705 - doggo in text, but actually a hedgehog \n", "733 - two dogs \n", "889 - two dogs \n", "956 - doggo in picture \n", "1063 - two dogs \n", "1113 - two dogs " ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "archive_clean.dog_stages[460] = np.nan\n", "archive_clean.dog_stages[575] = 'pupper'\n", "archive_clean.dog_stages[705] = np.nan\n", "archive_clean.dog_stages[956] = 'doggo'" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 222\n", "doggo 74 \n", "puppo 24 \n", "pupper,doggo 5 \n", "Name: dog_stages, dtype: int64" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.dog_stages.value_counts()" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "9 This is Cassie. She is a college pup. Studying international doggo communication and stick theory. 14/10 so elegant much sophisticate https://t.co/t1bfwz5S2A \n", "43 Meet Yogi. He doesn't have any important dog meetings today he just enjoys looking his best at all times. 12/10 for dangerously dapper doggo https://t.co/YSI00BzTBZ \n", "99 Here's a very large dog. He has a date later. Politely asked this water person to check if his breath is bad. 12/10 good to go doggo https://t.co/EMYIdoblMR \n", "108 This is Napolean. He's a Raggedy East Nicaraguan Zoom Zoom. Runs on one leg. Built for deception. No eyes. Good with kids. 12/10 great doggo https://t.co/PR7B7w1rUw \n", "110 Never doubt a doggo 14/10 https://t.co/AbBLh2FZCH \n", "121 This is Scout. He just graduated. Officially a doggo now. Have fun with taxes and losing sight of your ambitions. 12/10 would throw cap for https://t.co/DsA2hwXAJo \n", "172 I have stumbled puppon a doggo painting party. They're looking to be the next Pupcasso or Puppollock. All 13/10 would put it on the fridge https://t.co/cUeDMlHJbq \n", "200 At first I thought this was a shy doggo, but it's actually a Rare Canadian Floofer Owl. Amateurs would confuse the two. 11/10 only send dogs https://t.co/TXdT3tmuYk \n", "240 This is Barney. He's an elder doggo. Hitches a ride when he gets tired. Waves goodbye before he leaves. 13/10 please come back soon https://t.co/cFAasDXauK \n", "248 Say hello to Mimosa. She's an emotional support doggo who helps her owner with PTSD. 13/10, but she needs your help\\n\\nhttps://t.co/L6mLzrd7Mx https://t.co/jMutBFdw5o\n", "300 This is Meera. She just heard about taxes and how much a doghouse in a nice area costs. Not pupared to be a doggo anymore. 12/10 https://t.co/GZmNEdyoJY \n", "318 Here's a doggo fully pupared for a shower. H*ckin exquisite balance. Sneaky tongue slip too. 13/10 https://t.co/UtEVnQ1ZPg \n", "323 DOGGO ON THE LOOSE I REPEAT DOGGO ON THE LOOSE 10/10 https://t.co/ffIH2WxwF0 \n", "331 This is Rhino. He arrived at a shelter with an elaborate doggo manual for his new family, written by someone who will always love him. 13/10 https://t.co/QX1h0oqMz0 \n", "339 Say hello to Smiley. He's a blind therapy doggo having a h*ckin blast high steppin around in the snow. 14/10 would follow anywhere https://t.co/SHAb1wHjMz \n", "344 This is Miguel. He was the only remaining doggo at the adoption center after the weekend. Let's change that. 12/10\\n\\nhttps://t.co/P0bO8mCQwN https://t.co/SU4K34NT4M \n", "345 This is Emanuel. He's a h*ckin rare doggo. Dwells in a semi-urban environment. Round features make him extra collectible. 12/10 would so pet https://t.co/k9bzgyVdUT \n", "351 This is Pete. He has no eyes. Needs a guide doggo. Also appears to be considerably fluffy af. 12/10 would hug softly https://t.co/Xc0gyovCtK \n", "362 Here's a stressed doggo. Had a long day. Many things on her mind. The hat communicates these feelings exquisitely. 11/10 https://t.co/fmRS43mWQB \n", "363 This is Astrid. She's a guide doggo in training. 13/10 would follow anywhere https://t.co/xo7FZFIAao \n", "372 Meet Doobert. He's a deaf doggo. Didn't stop him on the field tho. Absolute legend today. 14/10 would pat head approvingly https://t.co/iCk7zstRA9 \n", "384 This is Loki. He smiles like Elvis. Ain't nothin but a hound doggo. 12/10 https://t.co/QV5nx6otZR \n", "385 This is Cupid. He was found in the trash. Now he's well on his way to prosthetic front legs and a long happy doggo life. 13/10 heroic af https://t.co/WS0Gha8vRh \n", "389 This is Pilot. He has mastered the synchronized head tilt and sneaky tongue slip. Usually not unlocked until later doggo days. 12/10 https://t.co/YIV8sw8xkh \n", "391 Here's a little more info on Dew, your favorite roaming doggo that went h*ckin viral. 13/10 \\nhttps://t.co/1httNYrCeW https://t.co/KvaM8j3jhX \n", "423 This is Duchess. She uses dark doggo forces to levitate her toys. 13/10 magical af https://t.co/maDNMETA52 \n", "426 This is Sundance. He's a doggo drummer. Even sings a bit on the side. 14/10 entertained af (vid by @sweetsundance) https://t.co/Xn5AQtiqzG \n", "429 Here's a doggo who looks like he's about to give you a list of mythical ingredients to go collect for his potion. 11/10 would obey https://t.co/8SiwKDlRcl \n", "440 Here we have a doggo who has messed up. He was hoping you wouldn't notice. 11/10 someone help him https://t.co/XdRNXNYD4E \n", "448 This is Sunny. She was also a very good First Doggo. 14/10 would also be an absolute honor to pet https://t.co/YOC1fHFCSb \n", " ... \n", "780 This is Anakin. He strives to reach his full doggo potential. Born with blurry tail tho. 11/10 would still pet well https://t.co/9CcBSxCXXG \n", "782 This is Finley. He's an independent doggo still adjusting to life on his own. 11/10 https://t.co/7FNcBaKbci \n", "807 Doggo will persevere. 13/10\\nhttps://t.co/yOVzAomJ6k \n", "835 Meet Gerald. He's a fairly exotic doggo. Floofy af. Inadequate knees tho. Self conscious about large forehead. 8/10 https://t.co/WmczvjCWJq \n", "839 I don't know any of the backstory behind this picture but for some reason I'm crying. 13/10 for owner and doggo https://t.co/QOKZdus9TT \n", "877 This is Wishes. He has the day off. Daily struggles of being a doggo have finally caught up with him. 11/10 https://t.co/H9YgrUkYwa \n", "881 Doggo want what doggo cannot have. Temptation strong, dog stronger. 12/10 https://t.co/IqyTF6qik6 \n", "899 This doggo is just waiting for someone to be proud of her and her accomplishment. 13/10 legendary af https://t.co/9T2h14yn4Q \n", "914 Here's a doggo completely oblivious to the double rainbow behind him. 10/10 someone tell him https://t.co/OfvRoD6ndV \n", "919 All hail sky doggo. 13/10 would jump super high to pet https://t.co/CsLRpqdeTF \n", "924 This is one of the most inspirational stories I've ever come across. I have no words. 14/10 for both doggo and owner https://t.co/I5ld3eKD5k \n", "944 Nothing better than a doggo and a sunset. 10/10 majestic af https://t.co/xVSodF19PS \n", "945 Hooman used Pokeball\\n*wiggle*\\n*wiggle*\\nDoggo broke free \\n10/10 https://t.co/bWSgqnwSHr \n", "948 Here's a doggo trying to catch some fish. 8/10 futile af (vid by @KellyBauerx) https://t.co/jwd0j6oWLE \n", "956 Please stop sending it pictures that don't even have a doggo or pupper in them. Churlish af. 5/10 neat couch tho https://t.co/u2c9c7qSg8 \n", "977 Meet Piper. She's an airport doggo. Please return your tray table to its full pupright and locked position. 11/10 https://t.co/D17IAcetmM \n", "985 This is Boomer. He's self-baptizing. Other doggo not ready to renounce sins. 11/10 spiritually awakened af https://t.co/cRTJiQQk9o \n", "989 Say hello to Divine Doggo. Must be magical af. 13/10 would be an honor to pet https://t.co/BbcABzohKb \n", "992 That is Quizno. This is his beach. He does not tolerate human shenanigans on his beach. 10/10 reclaim ur land doggo https://t.co/vdr7DaRSa7 \n", "1030 This is Lenox. She's in a wheelbarrow. Silly doggo. You don't belong there. 10/10 would push around https://t.co/oYbVR4nBsR \n", "1039 Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4 \n", "1051 For anyone who's wondering, this is what happens after a doggo catches it's tail... 11/10 https://t.co/G4fNhzelDv \n", "1075 Here's a doggo that don't need no human. 12/10 independent af (vid by @MichelleLiuCee) https://t.co/vdgtdb6rON \n", "1079 Here's a doggo blowing bubbles. It's downright legendary. 13/10 would watch on repeat forever (vid by Kent Duryee) https://t.co/YcXgHfp1EC \n", "1103 This is Kellogg. He accidentally opened the front facing camera. 8/10 get it together doggo https://t.co/MRYv7nDPyS \n", "1117 This is Kyle (pronounced 'Mitch'). He strives to be the best doggo he can be. 11/10 would pat on head approvingly https://t.co/aA2GiTGvlE \n", "1141 Here's a doggo struggling to cope with the winds. 13/10 https://t.co/qv3aUwaouT \n", "1156 Nothin better than a doggo and a sunset. 11/10 https://t.co/JlFqOhrHEs \n", "1176 This doggo was initially thrilled when she saw the happy cartoon pup but quickly realized she'd been deceived. 10/10 https://t.co/mvnBGaWULV \n", "1204 Here's a super majestic doggo and a sunset 11/10 https://t.co/UACnoyi8zu \n", "Name: text, Length: 74, dtype: object" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.dog_stages == 'doggo'\n", "\n", "archive_clean[mask].text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of the following tweets: \n", "363 This is Astrid. She's a guide doggo in training. 13/10 would follow anywhere https://t.co/xo7FZFIAao \n", "389 This is Pilot. He has mastered the synchronized head tilt and sneaky tongue slip. Usually not unlocked until later doggo days. 12/10 https://t.co/YIV8sw8xkh \n", "992 That is Quizno. This is his beach. He does not tolerate human shenanigans on his beach. 10/10 reclaim ur land doggo https://t.co/vdr7DaRSa7 \n", "\n", "363 is pupper, 298 is puppo and 992 is a horse." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "archive_clean.dog_stages[363] = 'pupper'\n", "archive_clean.dog_stages[389] = 'puppo'\n", "archive_clean.dog_stages[992] = np.nan" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 223\n", "doggo 71 \n", "puppo 25 \n", "pupper,doggo 5 \n", "Name: dog_stages, dtype: int64" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.dog_stages.value_counts()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "12 Here's a puppo that seems to be on the fence about something haha no but seriously someone help her. 13/10 https://t.co/BxvuXk0UCm \n", "14 This is Stuart. He's sporting his favorite fanny pack. Secretly filled with bones only. 13/10 puppared puppo #BarkWeek https://t.co/y70o6h3isq \n", "71 This is Snoopy. He's a proud #PrideMonthPuppo. Impeccable handwriting for not having thumbs. 13/10 would love back #PrideMonth https://t.co/lNZwgNO4gS \n", "94 This is Sebastian. He can't see all the colors of the rainbow, but he can see that this flag makes his human happy. 13/10 #PrideMonth puppo https://t.co/XBE0evJZ6V \n", "129 This is Shikha. She just watched you drop a skittle on the ground and still eat it. Could not be less impressed. 12/10 superior puppo https://t.co/XZlZKd73go \n", "168 Sorry for the lack of posts today. I came home from school and had to spend quality time with my puppo. Her name is Zoey and she's 13/10 https://t.co/BArWupFAn0 \n", "191 Here's a puppo participating in the #ScienceMarch. Cleverly disguising her own doggo agenda. 13/10 would keep the planet habitable for https://t.co/cMhq16isel \n", "389 This is Pilot. He has mastered the synchronized head tilt and sneaky tongue slip. Usually not unlocked until later doggo days. 12/10 https://t.co/YIV8sw8xkh \n", "395 Here's a very loving and accepting puppo. Appears to have read her Constitution well. 14/10 would pat head approvingly https://t.co/6ao80wIpV1 \n", "398 Say hello to Pablo. He's one gorgeous puppo. A true 12/10. Click the link to see why Pablo requests your assistance\\n\\nhttps://t.co/koHvVQp9bL https://t.co/IhW0JKf7kc\n", "413 Here's a super supportive puppo participating in the Toronto #WomensMarch today. 13/10 https://t.co/nTz3FtorBc \n", "439 This is Oliver. He has dreams of being a service puppo so he can help his owner. 13/10 selfless af\\n\\nmake it happen:\\nhttps://t.co/f5WMsx0a9K https://t.co/6lJz0DKZIb\n", "554 This is Diogi. He fell in the pool as soon as he was brought home. Clumsy puppo. 12/10 would pet until dry https://t.co/ZxeRjMKaWt \n", "567 This is Loki. He'll do your taxes for you. Can also make room in your budget for all the things you bought today. 12/10 what a puppo https://t.co/5oWrHCWg87 \n", "643 Say hello to Lily. She's pupset that her costume doesn't fit as well as last year. 12/10 poor puppo https://t.co/YSi6K1firY \n", "663 This is Betty. She's assisting with the dishes. Such a good puppo. 12/10 h*ckin helpful af https://t.co/dgvTPZ9tgI \n", "689 This is Tonks. She is a service puppo. Can hear a caterpillar hiccup from 7 miles away. 13/10 would follow anywhere https://t.co/i622ZbWkUp \n", "713 This is Reginald. He's one magical puppo. Aerodynamic af. 12/10 would catch https://t.co/t0cEeRbcXJ \n", "736 I want to finally rate this iconic puppo who thinks the parade is all for him. 13/10 would absolutely attend https://t.co/5dUYOu4b8d \n", "922 When ur older siblings get to play in the deep end but dad says ur not old enough. Maybe one day puppo. All 10/10 https://t.co/JrDAzMhwG9 \n", "947 Hopefully this puppo on a swing will help get you through your Monday. 11/10 would push https://t.co/G54yClasz2 \n", "961 This is Cooper. He's just so damn happy. 10/10 what's your secret puppo? https://t.co/yToDwVXEpA \n", "1035 This is Abby. She got her face stuck in a glass. Churlish af. 9/10 rookie move puppo https://t.co/2FPb45NXrK \n", "1048 This is Kilo. He cannot reach the snackum. Nifty tongue, but not nifty enough. 10/10 maybe one day puppo https://t.co/gSmp31Zrsx \n", "1083 This is Bayley. She fell asleep trying to escape her evil fence enclosure. 11/10 night night puppo https://t.co/AxSiqAKEKu \n", "Name: text, dtype: object" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.dog_stages == 'puppo'\n", "\n", "archive_clean[mask].text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In these tweets the word \"puppo\" seems to be meaningful." ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "29 This is Roscoe. Another pupper fallen victim to spontaneous tongue ejections. Get the BlepiPen immediate. 12/10 deep breaths Roscoe https://t.co/RGE08MIJox \n", "49 This is Gus. He's quite the cheeky pupper. Already perfected the disinterested wink. 12/10 would let steal my girl https://t.co/D43I96SlVu \n", "56 Here is a pupper approaching maximum borkdrive. Zooming at never before seen speeds. 14/10 paw-inspiring af \\n(IG: puffie_the_chow) https://t.co/ghXBIIeQZF \n", "82 This is Ginger. She's having a ruff Monday. Too many pupper things going on. H*ckin exhausting. 12/10 would snug passionately https://t.co/j211oCDRs6 \n", "92 This is Jed. He may be the fanciest pupper in the game right now. Knows it too. 13/10 would sign modeling contract https://t.co/0YplNnSMEm \n", "98 This is Sierra. She's one precious pupper. Absolute 12/10. Been in and out of ICU her whole life. Help Sierra below\\n\\nhttps://t.co/Xp01EU3qyD https://t.co/V5lkvrGLdQ\n", "107 This is Rover. As part of pupper protocol he had to at least attempt to eat the plant. Confirmed not tasty. Needs peanut butter. 12/10 https://t.co/AiVljI6QCg \n", "135 This is Jamesy. He gives a kiss to every other pupper he sees on his walk. 13/10 such passion, much tender https://t.co/wk7TfysWHr \n", "199 Sometimes you guys remind me just how impactful a pupper can be. Cooper will be remembered as a good boy by so many. 14/10 rest easy friend https://t.co/oBL7LEJEzR \n", "220 Say hello to Boomer. He's a sandy pupper. Having a h*ckin blast. 12/10 would pet passionately https://t.co/ecb3LvExde \n", "249 This is Pickles. She's a silly pupper. Thinks she's a dish. 12/10 would dry https://t.co/7mPCF4ZwEk \n", "293 Here's a pupper before and after being asked \"who's a good girl?\" Unsure as h*ck. 12/10 hint hint it's you https://t.co/ORiK6jlgdH \n", "297 This is Clark. He passed pupper training today. Round of appaws for Clark. 13/10 https://t.co/7pUjwe8X6B \n", "304 This is Ava. She just blasted off. Streamline af. Aerodynamic as h*ck. One small step for pupper, one giant leap for pupkind. 12/10 https://t.co/W4KffrdX3Q \n", "330 This is Gidget. She's a spy pupper. Stealthy as h*ck. Must've slipped pup and got caught. 12/10 would forgive then pet https://t.co/zD97KYFaFa \n", "352 I couldn't make it to the #WKCDogShow BUT I have people there on the ground relaying me the finest pupper pics possible. 13/10 for all https://t.co/jd6lYhfdH4 \n", "363 This is Astrid. She's a guide doggo in training. 13/10 would follow anywhere https://t.co/xo7FZFIAao \n", "378 This is Kona. Yesterday she stopped by the department to see what it takes to be a police pupper. 12/10 vest was only a smidge too big https://t.co/j8D3PQJvpJ \n", "402 Retweet the h*ck out of this 13/10 pupper #BellLetsTalk https://t.co/wBmc7OaGvS \n", "418 This is Gabe. He was the unequivocal embodiment of a dream meme, but also one h*ck of a pupper. You will be missed by so many. 14/10 RIP https://t.co/M3hZGadUuO \n", "444 Some happy pupper news to share. 10/10 for everyone involved \\nhttps://t.co/MefMAZX2uv \n", "478 Here's a pupper with squeaky hiccups. Please enjoy. 13/10 https://t.co/MiMKtsLN6k \n", "483 This is Cooper. Someone attacked him with a sharpie. Poor pupper. 11/10 nifty tongue slip tho https://t.co/01vpuRDXQ8 \n", "515 This is Craig. That's actually a normal sized fence he's stuck on. H*ckin massive pupper. 11/10 someone help him https://t.co/aAUXzoxaBy \n", "527 Here's a pupper in a onesie. Quite pupset about it. Currently plotting revenge. 12/10 would rescue https://t.co/xQfrbNK3HD \n", "533 This is Ollie Vue. He was a 3 legged pupper on a mission to overcome everything. This is very hard to write. 14/10 we will miss you Ollie https://t.co/qTRY2qX9y4 \n", "556 Pupper hath acquire enemy. 13/10 https://t.co/ns9qoElfsX \n", "575 This is Bones. He's being haunted by another doggo of roughly the same size. 12/10 deep breaths pupper everything's fine https://t.co/55Dqe0SJNj \n", "580 Here's a very sleepy pupper. Appears to be portable as h*ck. 12/10 would snug intensely https://t.co/61sX7pW5Ca \n", "608 Here's a helicopter pupper. He takes off at random. H*ckin hard to control. 12/10 rare af https://t.co/GRWPgNKt2z \n", " ... \n", "1875 Meet Zuzu. He just graduated college. Astute pupper. Needs 2 leashes to contain him. Wasn't ready for the pic. 10/10 https://t.co/2H5SKmk0k7 \n", "1880 Say hello to Mollie. This pic was taken after she bet all her toys on Ronda Rousey. 10/10 hang in there pupper https://t.co/QMmAqA9VqO \n", "1889 This is Superpup. His head isn't proportional to his body. Has yet to serve any justice. 11/10 maybe one day pupper https://t.co/gxIFgg8ktm \n", "1897 Meet Rufio. He is unaware of the pink legless pupper wrapped around him. Might want to get that checked 10/10 & 4/10 https://t.co/KNfLnYPmYh \n", "1903 This pupper is fed up with being tickled. 12/10 I'm currently working on an elaborate heist to steal this dog https://t.co/F33n1hy3LL \n", "1907 This pupper just wants a belly rub. This pupper has nothing to do w the tree being sideways now. 10/10 good pupper https://t.co/AyJ7Ohk71f \n", "1915 This is Lennon. He's in quite the predicament. 8/10 hang in there pupper https://t.co/7mf8XXPAZv \n", "1921 This is Gus. He's super stoked about being an elephant. Couldn't be happier. 9/10 for elephant pupper https://t.co/gJS1qU0jP7 \n", "1930 This is Kaiya. She's an aspiring shoe model. 12/10 follow your dreams pupper https://t.co/nX8FiGRHvk \n", "1936 This is one esteemed pupper. Just graduated college. 10/10 what a champ https://t.co/nyReCVRiyd \n", "1937 This is Obie. He is on guard watching for evildoers from the comfort of his pumpkin. Very brave pupper. 11/10 https://t.co/cdwPTsGEAb \n", "1945 This is Raymond. He's absolutely terrified of floating tennis ball. 10/10 it'll be ok pupper https://t.co/QyH1CaY3SM \n", "1948 This is Pickles. She's a tiny pointy pupper. Average walker. Very skeptical of wet leaf. 8/10 https://t.co/lepRCaGcgw \n", "1954 This is Albert AKA King Banana Peel. He's a kind ruler of the kitchen. Very jubilant pupper. 10/10 overall great dog https://t.co/PN8hxgZ9We \n", "1956 This is Jeffri. He's a speckled ice pupper. Very lazy. Enjoys the occasional swim. Rather majestic really. 7/10 https://t.co/0iyItbtkr8 \n", "1960 This little pupper can't wait for Christmas. He's pretending to be a present. S'cute. 11/10 twenty more days 🎁🎄🐶 https://t.co/m8r9rbcgX4 \n", "1967 This is Django. He's a skilled assassin pupper. 10/10 https://t.co/w0YTuiRd1a \n", "1970 Meet Eve. She's a raging alcoholic 8/10 (would b 11/10 but pupper alcoholism is a tragic issue that I can't condone) https://t.co/U36HYQIijg \n", "1974 This is Fletcher. He's had a ruff night. No more Fireball for Fletcher. 8/10 it'll be over soon pupper https://t.co/tA4WpkI2cw \n", "1977 This is Schnozz. He's had a blurred tail since birth. Hasn't let that stop him. 10/10 inspirational pupper https://t.co/a3zYMcvbXG \n", "1980 This is Chuckles. He is one skeptical pupper. 10/10 stay woke Chuckles https://t.co/ZlcF0TIRW1 \n", "1981 This is Chet. He's having a hard time. Really struggling. 7/10 hang in there pupper https://t.co/eb4ta0xtnd \n", "1985 This is Cheryl AKA Queen Pupper of the Skies. Experienced fighter pilot. Much skill. True hero. 11/10 https://t.co/i4XJEWwdsp \n", "1991 This lil pupper is sad because we haven't found Kony yet. RT to spread awareness. 12/10 would pet firmly https://t.co/Cv7dRdcMvQ \n", "1992 This is Norman. Doesn't bark much. Very docile pup. Up to date on current events. Overall nifty pupper. 6/10 https://t.co/ntxsR98f3U \n", "1995 Meet Scott. Just trying to catch his train to work. Doesn't need everybody staring. 9/10 ignore the haters pupper https://t.co/jyXbZ35MYz \n", "2002 Say hello to Jazz. She should be on the cover of Vogue. 12/10 gorgeous pupper https://t.co/mVCMemhXAP \n", "2009 This is Rolf. He's having the time of his life. 11/10 good pupper https://t.co/OO6MqEbqG3 \n", "2015 This is Opal. He's a Royal John Coctostan. Ready for transport. Basically indestructible. 9/10 good pupper https://t.co/yRBQF9OS7D \n", "2017 This is Bubba. He's a Titted Peebles Aorta. Evolutionary masterpiece. Comfortable with his body. 8/10 great pupper https://t.co/aNkkl5nH3W \n", "Name: text, Length: 223, dtype: object" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.dog_stages == 'pupper'\n", "\n", "archive_clean[mask].text" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "In these tweets the word \"pupper\" seems to be meaningful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. Explore the rating numerators and denominators to define if the ratings can be corrected or should be excluded." ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "# denominators \n", "\n", "denom_not_10 = archive_clean.rating_denominator.value_counts().index.tolist()[1:]" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "433 The floofs have been released I repeat the floofs have been released. 84/70 https://t.co/NIYC820tmd \n", "516 Meet Sam. She smiles 24/7 & secretly aspires to be a reindeer. \\nKeep Sam smiling by clicking and sharing this link:\\nhttps://t.co/98tB8y7y7t https://t.co/LouL5vdvxx\n", "902 Why does this never happen at my front door... 165/150 https://t.co/HmwrdfEfUE \n", "1068 After so many requests, this is Bretagne. She was the last surviving 9/11 search dog, and our second ever 14/10. RIP https://t.co/XAVDNDaVgQ \n", "1120 Say hello to this unbelievably well behaved squad of doggos. 204/170 would try to pet all at once https://t.co/yGQI3He3xv \n", "1165 Happy 4/20 from the squad! 13/10 for all https://t.co/eV1diwds8a \n", "1202 This is Bluebert. He just saw that both #FinalFur match ups are split 50/50. Amazed af. 11/10 https://t.co/Kky1DPG4iq \n", "1228 Happy Saturday here's 9 puppers on a bench. 99/90 good work everybody https://t.co/mpvaVxKmc1 \n", "1254 Here's a brigade of puppers. All look very prepared for whatever happens next. 80/80 https://t.co/0eb7R1Om12 \n", "1274 From left to right:\\nCletus, Jerome, Alejandro, Burp, & Titson\\nNone know where camera is. 45/50 would hug all at once https://t.co/sedre1ivTK \n", "1351 Here is a whole flock of puppers. 60/50 I'll take the lot https://t.co/9dpcw6MdWa \n", "1433 Happy Wednesday here's a bucket of pups. 44/40 would pet all at once https://t.co/HppvrYuamZ \n", "1635 Someone help the girl is being mugged. Several are distracting her while two steal her shoes. Clever puppers 121/110 https://t.co/1zfnTJLt55 \n", "1662 This is Darrel. He just robbed a 7/11 and is in a high speed police chase. Was just spotted by the helicopter 10/10 https://t.co/7EsP8LmSp5 \n", "1779 IT'S PUPPERGEDDON. Total of 144/120 ...I think https://t.co/ZanVtAtvIq \n", "1843 Here we have an entire platoon of puppers. Total score: 88/80 would pet all at once https://t.co/y93p6FLvVw \n", "2335 This is an Albanian 3 1/2 legged Episcopalian. Loves well-polished hardwood flooring. Penis on the collar. 9/10 https://t.co/d9NcXFKwLv \n", "Name: text, dtype: object" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.rating_denominator.isin(denom_not_10)\n", "\n", "archive_clean[mask].text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two main types of mistakes. \n", "1. The first occurrence of / was picked up, but it was a date of something else, but not the rating. These are scares and can be replaced manually.\n", "2. There are many dogs on the picture, their number goes as a multiplier. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of the tweets above: \n", "516 - no rating, should be excluded \n", "1068 - wrong numbers taken for ratings, should be 14/10 \n", "1165 - wrong numbers taken for ratings, should be 13/10 \n", "1202 - wrong numbers taken for rating, should be 11/10 \n", "1662 - wrong numbers taken for rating, should be 10/10 \n", "2335 - wrong numbers taken for rating, should be 9/10\n", "\n", "In other tweets ratings are \"adjusted\" by the number of dogs in the picture. Since the ratings will be used in float forms, this can be left as is for further division." ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "archive_clean = archive_clean.drop(516)\n", "\n", "archive_clean.rating_numerator[1068] = 14\n", "archive_clean.rating_denominator[1068] = 10\n", "\n", "archive_clean.rating_numerator[1165] = 13\n", "archive_clean.rating_denominator[1165] = 10\n", "\n", "archive_clean.rating_numerator[1202] = 11\n", "archive_clean.rating_denominator[1202] = 10\n", "\n", "archive_clean.rating_numerator[1662] = 10\n", "archive_clean.rating_denominator[1662] = 10\n", "\n", "archive_clean.rating_numerator[2335] = 9\n", "archive_clean.rating_denominator[2335] = 10" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10 2085\n", "80 2 \n", "50 2 \n", "170 1 \n", "150 1 \n", "120 1 \n", "110 1 \n", "90 1 \n", "70 1 \n", "40 1 \n", "Name: rating_denominator, dtype: int64" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.rating_denominator.value_counts()" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "checked_denominators = archive_clean.rating_denominator.value_counts().index.tolist()[1:]\n", "mask = ~archive_clean.rating_denominator.isin(checked_denominators)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12 486\n", "10 437\n", "11 414\n", "13 288\n", "9 153\n", "8 98 \n", "7 51 \n", "14 39 \n", "5 34 \n", "6 32 \n", "3 19 \n", "4 15 \n", "2 9 \n", "1 4 \n", "75 1 \n", "420 1 \n", "26 1 \n", "27 1 \n", "1776 1 \n", "0 1 \n", "Name: rating_numerator, dtype: int64" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# numerators\n", "\n", "archive_clean[mask].rating_numerator.value_counts()" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idtext
695786709082849828864This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS
763778027034220126208This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq
979749981277374128128This is Atticus. He's quite simply America af. 1776/10 https://t.co/GRXwMxLBkh
1712680494726643068929Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD
2074670842764863651840After so many requests... here you go.\\n\\nGood dogg. 420/10 https://t.co/yfAAo1gdeY
\n", "
" ], "text/plain": [ " tweet_id \\\n", "695 786709082849828864 \n", "763 778027034220126208 \n", "979 749981277374128128 \n", "1712 680494726643068929 \n", "2074 670842764863651840 \n", "\n", " text \n", "695 This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. H*ckin magical af 9.75/10 https://t.co/yBO5wuqaPS \n", "763 This is Sophie. She's a Jubilant Bush Pupper. Super h*ckin rare. Appears at random just to smile at the locals. 11.27/10 would smile back https://t.co/QFaUiIHxHq \n", "979 This is Atticus. He's quite simply America af. 1776/10 https://t.co/GRXwMxLBkh \n", "1712 Here we have uncovered an entire battalion of holiday puppers. Average of 11.26/10 https://t.co/eNm2S6p9BD \n", "2074 After so many requests... here you go.\\n\\nGood dogg. 420/10 https://t.co/yfAAo1gdeY " ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask_num = (archive_clean[mask].rating_numerator > 14)\n", "\n", "archive_clean.loc[mask_num[mask_num == True].index, :][['tweet_id', 'text']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are several tweets where ratings are not in typical forms because of special occasions, like Christmas. Three last tweets may be dropped.\n", "Also not all numerators seem to be in integer format, may be useful to check for halves." ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "archive_clean = archive_clean.drop([979, 1712, 2074])" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "archive_clean.rating_numerator = archive_clean.rating_numerator.astype(float)\n", "\n", "archive_clean.rating_numerator[695] = 9.75\n", "archive_clean.rating_numerator[763] = 11.27" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "45 This is Bella. She hopes her smile made you smile. If not, she is also offering you her favorite monkey. 13.5/10 https://t.co/qjrljjt948 \n", "730 Who keeps sending in pictures without dogs in them? This needs to stop. 5/10 for the mediocre road https://t.co/ELqelxWMrC \n", "956 Please stop sending it pictures that don't even have a doggo or pupper in them. Churlish af. 5/10 neat couch tho https://t.co/u2c9c7qSg8 \n", "1399 This is Dave. He's a tropical pup. Short lil legs (dachshund mix?) Excels underwater, but refuses to eat kibble 5/10 https://t.co/ZJnCxlIf62 \n", "1461 Please only send in dogs. This t-rex is very scary. 5/10 ...might still pet (vid by @helizabethmicha) https://t.co/Vn6w5w8TO2 \n", "1508 When bae says they can't go out but you see them with someone else that same night. 5/10 & 10/10 for heartbroken pup https://t.co/aenk0KpoWM\n", "1583 Army of water dogs here. None of them know where they're going. Have no real purpose. Aggressive barks. 5/10 for all https://t.co/A88x73TwMN \n", "1619 This is Jerry. He's a neat dog. No legs (tragic). Has more horns than a dog usually does. Bark is unique af. 5/10 https://t.co/85q7xlplsJ \n", "1624 Here we have a basking dino pupper. Looks powerful. Occasionally shits eggs. Doesn't want the holidays to end. 5/10 https://t.co/DnNweb5eTO \n", "1645 This is Jiminy. He's not the brightest dog. Needs to lay off the kibble. 5/10 still petable https://t.co/omln4LOy1x \n", "1680 Unique dog here. Wrinkly as hell. Weird segmented neck. Finger on fire. Doesn't seem to notice. 5/10 might still pet https://t.co/Hy9La4xNX3 \n", "1727 Meet Penelope. She's a bacon frise. Total babe (lol get it like the movie). Doesn't bark tho. 5/10 very average dog https://t.co/SDcQYg0HSZ \n", "1796 This is Juckson. He's totally on his way to a nascar race. 5/10 for Juckson https://t.co/IoLRvF0Kak \n", "1808 Exotic handheld dog here. Appears unathletic. Feet look deadly. Can be thrown a great distance. 5/10 might pet idk https://t.co/Avq4awulqk \n", "1820 This is Bubbles. He kinda resembles a fish. Always makes eye contact with u no matter what. Sneaky tongue slip. 5/10 https://t.co/Nrhvc5tLFT \n", "1861 Rare shielded battle dog here. Very happy about abundance of lettuce. Painfully slow fetcher. Still petable. 5/10 https://t.co/C3tlKVq7eO \n", "1874 This is Steven. He got locked outside. Damn it Steven. 5/10 nice grill tho https://t.co/zf7Sxxjfp3 \n", "1901 Two gorgeous dogs here. Little waddling dog is a rebel. Refuses to look at camera. Must be a preteen. 5/10 & 8/10 https://t.co/YPfw7oahbD \n", "1904 Rare submerged pup here. Holds breath for a long time. Frowning because that spoon ignores him. 5/10 would still pet https://t.co/EJzzNHE8bE \n", "1925 This is Earl. Earl is lost. Someone help Earl. He has no tags. Just trying to get home. 5/10 hang in there Earl https://t.co/1ZbfqAVDg6 \n", "1979 Extraordinary dog here. Looks large. Just a head. No body. Rather intrusive. 5/10 would still pet https://t.co/ufHWUFA9Pu \n", "2013 Exotic underwater dog here. Very shy. Wont return tennis balls I toss him. Never been petted. 5/10 I bet he's soft https://t.co/WH7Nzc5IBA \n", "2026 This is Brad. He's a chubby lil pup. Doesn't really need the food he's trying to reach. 5/10 you've had enough Brad https://t.co/vPXKSaNsbE \n", "2063 This is Anthony. He just finished up his masters at Harvard. Unprofessional tattoos. Always looks perturbed. 5/10 https://t.co/iHLo9rGay1 \n", "2092 This dude slaps your girl's ass what do you do?\\n5/10 https://t.co/6dioUL6gcP \n", "2109 Vibrant dog here. Fabulous tail. Only 2 legs tho. Has wings but can barely fly (lame). Rather elusive. 5/10 okay pup https://t.co/cixC0M3P1e \n", "2134 This is Randall. He's from Chernobyl. Built playground himself. Has been stuck up there quite a while. 5/10 good dog https://t.co/pzrvc7wKGd \n", "2139 Awesome dog here. Not sure where it is tho. Spectacular camouflage. Enjoys leaves. Not very soft. 5/10 still petable https://t.co/rOTOteKx4q \n", "2153 This is a brave dog. Excellent free climber. Trying to get closer to God. Not very loyal though. Doesn't bark. 5/10 https://t.co/ODnILTr4QM \n", "2181 Two gorgeous pups here. Both have cute fake horns(adorable). Barn in the back looks on fire. 5/10 would pet rly well https://t.co/w5oYFXi0uh \n", "2206 Meet Zeek. He is a grey Cumulonimbus. Zeek is hungry. Someone should feed Zeek asap. 5/10 absolutely terrifying https://t.co/fvVNScw8VH \n", "2242 Wow. Armored dog here. Ready for battle. Face looks dangerous. Not very loyal. Lil dog on back havin a blast. 5/10 https://t.co/SyMoWrp368 \n", "2312 This is Josep. He is a Rye Manganese mix. Can drive w eyes closed. Very irresponsible. Menace on the roadways. 5/10 https://t.co/XNGeDwrtYH \n", "2351 Here we have a 1949 1st generation vulpix. Enjoys sweat tea and Fox News. Cannot be phased. 5/10 https://t.co/4B7cOc1EDq \n", "Name: text, dtype: object" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = archive_clean.rating_numerator == 5\n", "\n", "archive_clean[mask].text" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "archive_clean.rating_numerator[45] = 13.5" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12.00 486\n", "10.00 437\n", "11.00 414\n", "13.00 288\n", "9.00 153\n", "8.00 98 \n", "7.00 51 \n", "14.00 39 \n", "5.00 33 \n", "6.00 32 \n", "3.00 19 \n", "4.00 15 \n", "2.00 9 \n", "1.00 4 \n", "60.00 1 \n", "11.27 1 \n", "45.00 1 \n", "204.00 1 \n", "13.50 1 \n", "9.75 1 \n", "121.00 1 \n", "84.00 1 \n", "0.00 1 \n", "80.00 1 \n", "88.00 1 \n", "144.00 1 \n", "44.00 1 \n", "165.00 1 \n", "99.00 1 \n", "Name: rating_numerator, dtype: int64" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.rating_numerator.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "12. Combine the cleaned 'rating_numerator' and 'rating_denominator' columns in one 'rating' column in float format." ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [], "source": [ "archive_clean['rating'] = archive_clean.rating_numerator / archive_clean.rating_denominator" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 2093.000000\n", "mean 1.061468 \n", "std 0.214564 \n", "min 0.000000 \n", "25% 1.000000 \n", "50% 1.100000 \n", "75% 1.200000 \n", "max 1.400000 \n", "Name: rating, dtype: float64" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "archive_clean.rating.describe()" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 2093 entries, 0 to 2355\n", "Data columns (total 9 columns):\n", "tweet_id 2093 non-null int64\n", "timestamp 2093 non-null datetime64[ns]\n", "source 2093 non-null category\n", "text 2093 non-null object\n", "expanded_urls 2090 non-null object\n", "name 1410 non-null object\n", "floofer 10 non-null object\n", "dog_stages 324 non-null object\n", "rating 2093 non-null float64\n", "dtypes: category(1), datetime64[ns](1), float64(1), int64(1), object(5)\n", "memory usage: 229.4+ KB\n" ] } ], "source": [ "archive_clean = archive_clean[['tweet_id', 'timestamp', 'source', \n", " 'text', 'expanded_urls', 'name', \n", " 'floofer', 'dog_stages', 'rating']]\n", "\n", "archive_clean.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "13. Join twitter archive data with image predictions data and additional information from Twitter on tweet IDs. Keep only rows with data in all three dataframes" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 1967 entries, 0 to 1966\n", "Data columns (total 20 columns):\n", "tweet_id 1967 non-null int64\n", "timestamp 1967 non-null datetime64[ns]\n", "source 1967 non-null category\n", "text 1967 non-null object\n", "expanded_urls 1967 non-null object\n", "name 1369 non-null object\n", "floofer 8 non-null object\n", "dog_stages 293 non-null object\n", "rating 1967 non-null float64\n", "jpg_url 1967 non-null object\n", "img_num 1967 non-null int64\n", "p1 1967 non-null object\n", "p1_conf 1967 non-null float64\n", "p1_dog 1967 non-null bool\n", "p2 1967 non-null object\n", "p2_conf 1967 non-null float64\n", "p2_dog 1967 non-null bool\n", "p3 1967 non-null object\n", "p3_conf 1967 non-null float64\n", "p3_dog 1967 non-null bool\n", "dtypes: bool(3), category(1), datetime64[ns](1), float64(4), int64(2), object(9)\n", "memory usage: 269.1+ KB\n" ] } ], "source": [ "twitter_archive_master = archive_clean.merge(image_predictions, on = 'tweet_id', suffixes = ('', '_imp'))\n", "twitter_archive_master.info()" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 1965 entries, 0 to 1964\n", "Data columns (total 22 columns):\n", "tweet_id 1965 non-null int64\n", "timestamp 1965 non-null datetime64[ns]\n", "source 1965 non-null category\n", "text 1965 non-null object\n", "expanded_urls 1965 non-null object\n", "name 1367 non-null object\n", "floofer 8 non-null object\n", "dog_stages 293 non-null object\n", "rating 1965 non-null float64\n", "jpg_url 1965 non-null object\n", "img_num 1965 non-null int64\n", "p1 1965 non-null object\n", "p1_conf 1965 non-null float64\n", "p1_dog 1965 non-null bool\n", "p2 1965 non-null object\n", "p2_conf 1965 non-null float64\n", "p2_dog 1965 non-null bool\n", "p3 1965 non-null object\n", "p3_conf 1965 non-null float64\n", "p3_dog 1965 non-null bool\n", "favorite_count 1965 non-null int64\n", "retweet_count 1965 non-null int64\n", "dtypes: bool(3), category(1), datetime64[ns](1), float64(4), int64(4), object(9)\n", "memory usage: 299.5+ KB\n" ] } ], "source": [ "twitter_archive_master = twitter_archive_master.merge(tweet_jsons, on = 'tweet_id', suffixes = ('', '_jsons'))\n", "twitter_archive_master.info()" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [], "source": [ "# writing cleaned data to csv file\n", "twitter_archive_master.to_csv('twitter_archive_master.csv', index = False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Data Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A separate text-only report on data analysis in HTML format was recreated with R Markdown and includes a little less findings than the code here, for it was becoming too long. See it [here](act_report.html). " ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [], "source": [ "# setting up graphics\n", "import matplotlib.pyplot as plt \n", "\n", "% matplotlib inline\n", "plt.rcParams['figure.figsize'] = (10, 6)" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [], "source": [ "# loading cleaned data\n", "df = pd.read_csv('twitter_archive_master.csv')" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "# fixing types\n", "df['timestamp'] = pd.to_datetime(df.timestamp)\n", "df['dog_stages'] = df.dog_stages.astype('category')\n", "df['source'] = df.source.astype('category')\n", "df = df.set_index('timestamp')" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "DatetimeIndex: 1965 entries, 2017-08-01 16:23:56 to 2015-11-15 22:32:08\n", "Data columns (total 21 columns):\n", "tweet_id 1965 non-null int64\n", "source 1965 non-null category\n", "text 1965 non-null object\n", "expanded_urls 1965 non-null object\n", "name 1367 non-null object\n", "floofer 8 non-null object\n", "dog_stages 293 non-null category\n", "rating 1965 non-null float64\n", "jpg_url 1965 non-null object\n", "img_num 1965 non-null int64\n", "p1 1965 non-null object\n", "p1_conf 1965 non-null float64\n", "p1_dog 1965 non-null bool\n", "p2 1965 non-null object\n", "p2_conf 1965 non-null float64\n", "p2_dog 1965 non-null bool\n", "p3 1965 non-null object\n", "p3_conf 1965 non-null float64\n", "p3_dog 1965 non-null bool\n", "favorite_count 1965 non-null int64\n", "retweet_count 1965 non-null int64\n", "dtypes: bool(3), category(2), float64(4), int64(4), object(8)\n", "memory usage: 270.9+ KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, Python. Who is the most favorited dog of all times at [@dog_rates](https://twitter.com/dog_rates/)? At least in this dataset." ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tweet: Here's a doggo realizing you can stand in a pool. 13/10 enlightened af (vid by Tina Conrad) https://t.co/7wE9LTEXC4\n", " Favorite count: 163530\n", " Retweet_count: 83146\n" ] } ], "source": [ "top_dog = df.loc[df.favorite_count.idxmax(), : ]\n", "\n", "print(\"Tweet:\", top_dog.text + \"\\n\", \n", " \"Favorite count: \", str(top_dog.favorite_count) + \"\\n\", \n", " \"Retweet_count:\", top_dog.retweet_count)" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import Image\n", "from IPython.core.display import HTML \n", "\n", "Image(url = top_dog.jpg_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is actually a video. And maybe you should [take a look](https://twitter.com/dog_rates/status/744234799360020481/video/1), too. But I guess, I'm not the first who suggests that. By the way, the lowest rating received a screenshot from another Twitter account fot plagiarism. Do you agree?" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"When you're so blinded by your systematic plagiarism that you forget what day it is. 0/10 https://t.co/YbEJPkg4Ag\"" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[df.rating.idxmin(), ].text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, let's be a bit more serious. The cleaned data set consists of 1965 rows and 22 variables, including data from the WeRateDogs Twitter archive, addtional Twitter data, gathered by API, and dog breed predictions, made by a neural network." ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_idratingimg_nump1_confp2_confp3_conffavorite_countretweet_count
count1.965000e+031965.0000001965.0000001965.0000001.965000e+031.965000e+031965.0000001965.000000
mean7.360774e+171.0547341.2025450.5945731.346915e-016.021736e-028740.9526722650.865649
std6.756862e+160.2166940.5597620.2720611.010774e-015.099516e-0212810.6011114724.318781
min6.660209e+170.0000001.0000000.0443331.011300e-081.740170e-1078.00000011.000000
25%6.758531e+171.0000001.0000000.3629255.351500e-021.605590e-021887.000000591.000000
50%7.088343e+171.1000001.0000000.5877641.175080e-014.934910e-023939.0000001273.000000
75%7.881506e+171.2000001.0000000.8472921.955730e-019.160200e-0210892.0000003028.000000
max8.924206e+171.4000004.0000001.0000004.880140e-012.734190e-01163530.00000083146.000000
\n", "
" ], "text/plain": [ " tweet_id rating img_num p1_conf p2_conf \\\n", "count 1.965000e+03 1965.000000 1965.000000 1965.000000 1.965000e+03 \n", "mean 7.360774e+17 1.054734 1.202545 0.594573 1.346915e-01 \n", "std 6.756862e+16 0.216694 0.559762 0.272061 1.010774e-01 \n", "min 6.660209e+17 0.000000 1.000000 0.044333 1.011300e-08 \n", "25% 6.758531e+17 1.000000 1.000000 0.362925 5.351500e-02 \n", "50% 7.088343e+17 1.100000 1.000000 0.587764 1.175080e-01 \n", "75% 7.881506e+17 1.200000 1.000000 0.847292 1.955730e-01 \n", "max 8.924206e+17 1.400000 4.000000 1.000000 4.880140e-01 \n", "\n", " p3_conf favorite_count retweet_count \n", "count 1.965000e+03 1965.000000 1965.000000 \n", "mean 6.021736e-02 8740.952672 2650.865649 \n", "std 5.099516e-02 12810.601111 4724.318781 \n", "min 1.740170e-10 78.000000 11.000000 \n", "25% 1.605590e-02 1887.000000 591.000000 \n", "50% 4.934910e-02 3939.000000 1273.000000 \n", "75% 9.160200e-02 10892.000000 3028.000000 \n", "max 2.734190e-01 163530.000000 83146.000000 " ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen from the summary statistics on favorites, with the mean favorite count of about 8741, our top dog is a real outlier. Same is true for the retweets - the mean is about 2651. The distributions seem to be noticeably right-skewed, we can change that with histograms." ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEshJREFUeJzt3X+QXWV9x/H3t0RAiRJ+maZJpgsVnaIZf2RLsbadDdjyc4TOSAeH0Yg4mVbqaNVqKDO2djpT0FJQ2lEzoo02NVDEhgE6jo1srTMVJCDEiEjAFBYYIgZig9gx02//OM+Gm83+uHv33r03T96vmTt7znOee873PHvz2XPPOfcmMhNJUr1+qd8FSJJ6y6CXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVW5BvwsAOP7443NoaKij5z733HMcddRR3S1ojgaxJhjMuqypPYNYEwxmXYdSTVu2bHk6M0+YsWNm9v2xcuXK7NQdd9zR8XN7ZRBryhzMuqypPYNYU+Zg1nUo1QTcnW1krKduJKlyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcgd90G99fDdDa29jaO1t/S5FkgbSQR/0kqTpGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalybQd9RBwWEfdGxK1l/sSIuDMiHoqIGyLi8NJ+RJnfXpYP9aZ0SVI7ZnNE/z7ggZb5q4BrMvNk4Bng0tJ+KfBMZr4CuKb0kyT1SVtBHxHLgHOBz5X5AE4Hbipd1gMXlOnzyzxl+RmlvySpDxa02e9a4MPAS8v8ccCzmbm3zI8BS8v0UuAxgMzcGxG7S/+nu1LxNIbW3rZveseV5/Z6c5J0UIjMnL5DxHnAOZn5nogYAT4EXAL8Vzk9Q0QsB27PzBURsQ04MzPHyrKHgVMz8ycT1rsGWAOwePHilRs3buxoB3bu2s1Tzx/YvmLp0R2trxv27NnDwoUL+7b9qQxiXdbUnkGsCQazrkOpplWrVm3JzOGZ+rVzRP8m4C0RcQ5wJPAymiP8RRGxoBzVLwOeKP3HgOXAWEQsAI4Gdk1caWauA9YBDA8P58jISBulHOi6DZu4euuBu7Hj4s7W1w2jo6N0uj+9NIh1WVN7BrEmGMy6rOlAM56jz8zLM3NZZg4BFwHfyMyLgTuAt5Zuq4FNZfqWMk9Z/o2c6W2DJKln5nIf/UeAD0TEdppz8NeX9uuB40r7B4C1cytRkjQX7V6MBSAzR4HRMv0IcOokfX4OXNiF2iRJXeAnYyWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUuRmDPiKOjIi7IuK+iNgWER8r7SdGxJ0R8VBE3BARh5f2I8r89rJ8qLe7IEmaTjtH9P8LnJ6ZrwVeB5wVEacBVwHXZObJwDPApaX/pcAzmfkK4JrST5LUJzMGfTb2lNkXlUcCpwM3lfb1wAVl+vwyT1l+RkRE1yqWJM1KW+foI+KwiPgusBP4OvAw8Gxm7i1dxoClZXop8BhAWb4bOK6bRUuS2heZ2X7niEXAV4GPAl8op2eIiOXA7Zm5IiK2AWdm5lhZ9jBwamb+ZMK61gBrABYvXrxy48aNHe3Azl27eer5A9tXLD26o/V1w549e1i4cGHftj+VQazLmtoziDXBYNZ1KNW0atWqLZk5PFO/BbNZaWY+GxGjwGnAoohYUI7alwFPlG5jwHJgLCIWAEcDuyZZ1zpgHcDw8HCOjIzMppR9rtuwiau3HrgbOy7ubH3dMDo6Sqf700uDWJc1tWcQa4LBrMuaDtTOXTcnlCN5IuLFwJuBB4A7gLeWbquBTWX6ljJPWf6NnM3bBklSV7VzRL8EWB8Rh9H8YbgxM2+NiO8DGyPir4F7getL/+uBL0XEdpoj+Yt6ULckqU0zBn1m3g+8fpL2R4BTJ2n/OXBhV6qTJM2Zn4yVpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqtyCfhfQK0Nrb9s3vePKc/tYiST1l0f0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyMwZ9RCyPiDsi4oGI2BYR7yvtx0bE1yPiofLzmNIeEfGpiNgeEfdHxBt6vROSpKm1c0S/F/hgZv46cBpwWUScAqwFNmfmycDmMg9wNnByeawBPt31qiVJbZsx6DPzycy8p0z/D/AAsBQ4H1hfuq0HLijT5wNfzMa3gUURsaTrlUuS2hKZ2X7niCHgm8BrgEczc1HLsmcy85iIuBW4MjO/Vdo3Ax/JzLsnrGsNzRE/ixcvXrlx48aOdmDnrt089fz0fVYsPbqjdXdqz549LFy4cF632Y5BrMua2jOINcFg1nUo1bRq1aotmTk8U7+2/yvBiFgIfAV4f2b+NCKm7DpJ2wF/TTJzHbAOYHh4OEdGRtotZT/XbdjE1Vun340dF3e27k6Njo7S6f700iDWZU3tGcSaYDDrsqYDtXXXTUS8iCbkN2TmzaX5qfFTMuXnztI+Bixvefoy4InulCtJmq127roJ4Hrggcz8u5ZFtwCry/RqYFNL+zvK3TenAbsz88ku1ixJmoV2Tt28CXg7sDUivlva/hy4ErgxIi4FHgUuLMtuB84BtgM/Ay7pasWSpFmZMejLRdWpTsifMUn/BC6bY12SpC7xk7GSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZVb0O8C5sPQ2tv2Te+48tw+ViJJ888jekmqnEEvSZUz6CWpcga9JFXOoJekys14101EfB44D9iZma8pbccCNwBDwA7gDzPzmYgI4JPAOcDPgHdm5j29Kb0zrXfggHfhSKpfO0f0/wicNaFtLbA5M08GNpd5gLOBk8tjDfDp7pQpSerUjEGfmd8Edk1oPh9YX6bXAxe0tH8xG98GFkXEkm4VK0mavU7P0S/OzCcBys+Xl/alwGMt/cZKmySpTyIzZ+4UMQTc2nKO/tnMXNSy/JnMPCYibgP+JjO/Vdo3Ax/OzC2TrHMNzekdFi9evHLjxo0d7cDOXbt56vmOngrAiqVHd/7kKezZs4eFCxd2fb1zNYh1WVN7BrEmGMy6DqWaVq1atSUzh2fq1+lXIDwVEUsy88lyamZnaR8Dlrf0WwY8MdkKMnMdsA5geHg4R0ZGOirkug2buHpr59/ksOPizrY7ndHRUTrdn14axLqsqT2DWBMMZl3WdKBOT93cAqwu06uBTS3t74jGacDu8VM8kqT+aOf2yi8DI8DxETEG/AVwJXBjRFwKPApcWLrfTnNr5Xaa2ysv6UHNkqRZmDHoM/NtUyw6Y5K+CVw216IkSd1zSHxN8XT8CmNJtfMrECSpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVO+Q/MNXKD09JqpFH9JJUOY/op+DRvaRaeEQvSZUz6CWpcga9JFXOoJekyhn0klQ577ppg3fgSDqYeUQvSZUz6CWpcga9JFXOc/Rz4Ll7SQcDj+glqXIGvSRVzlM3s9R6umYqWx/fzTtLP0/pSOo3j+glqXIe0feYF2wl9ZtH9JJUOY/ou6T1yP2DK/pYiCRN4BG9JFXOI/p5NNX5es/jS+olj+glqXIGvSRVzlM3fdLOB6/aea6neiTNxKAfMO2cx5ek2TDoB5jhLqkbehL0EXEW8EngMOBzmXllL7aj/XlKR9Jkuh70EXEY8A/A7wFjwHci4pbM/H63t6Wpj/oNfUnjenFEfyqwPTMfAYiIjcD5gEHfJ+Oh/8EVe/d9qyZ0dg3A+/+lg08vgn4p8FjL/Bjwmz3Yjuaok2sA7byDmMps/xjM9sL0VOuf7X5O/IPYzjamMttaZ6tff2ynOniYro5e19pa00jX195d8/17i8zs7gojLgTOzMx3l/m3A6dm5nsn9FsDrCmzrwIe7HCTxwNPd/jcXhnEmmAw67Km9gxiTTCYdR1KNf1qZp4wU6deHNGPActb5pcBT0zslJnrgHVz3VhE3J2Zw3NdTzcNYk0wmHVZU3sGsSYYzLqs6UC9+GTsd4CTI+LEiDgcuAi4pQfbkSS1oetH9Jm5NyL+BPgaze2Vn8/Mbd3ejiSpPT25jz4zbwdu78W6JzHn0z89MIg1wWDWZU3tGcSaYDDrsqYJun4xVpI0WPz2SkmqXWYetA/gLJrbMrcDa7u87uXAHcADwDbgfaX9WODrwEPl5zGlPYBPlVruB97Qsq7Vpf9DwOqW9pXA1vKcT1HeYbVR22HAvcCtZf5E4M6y/huAw0v7EWV+e1k+1LKOy0v7gzS3w85pTIFFwE3AD8qYvbHfYwX8afndfQ/4MnBkP8YK+DywE/heS1vPx2aqbUxT0yfK7+9+4KvAok7HoMNxPqCmlmUfAhI4vt/jVNrfW/Z7G/Dx+RynjvJsLk/u54Mm7B4GTgIOB+4DTuni+peMv3iAlwI/BE4BPj7+iwLWAleV6XOAfysvwNOAO1teRI+Un8eU6fF/1HfRBGKU557dZm0fAP6ZF4L+RuCiMv0Z4I/L9HuAz5Tpi4AbyvQpZbyOKC+0h8t4djymwHrg3WX6cJrg79tY0Xxw70fAi1vG6J39GCvgd4E3sH+o9nxsptrGNDX9PrCgTF/VUtOsx2C24zxVTaV9Oc3NHf/NC0Hfz3FaBfw7cESZf/l8jlNHedatYJzvR/mFfa1l/nLg8h5ubxPN9/c8CCwpbUuAB8v0Z4G3tfR/sCx/G/DZlvbPlrYlwA9a2vfrN00dy4DNwOnAreVF+zQv/APdNy7lH8cby/SC0i8mjtV4v07HFHgZTajGhPa+jRUvfEL72LLvtwJn9musgCH2D4uej81U25iqpgnL/gDYMNm+zTQGnbwmp6uJ5p3ia4EdvBD0fRsnmnB+8yRjNm/jNNvHwXyOfrKvWljaiw1FxBDwepq3UIsz80mA8vPlM9QzXfvYJO0zuRb4MPB/Zf444NnM3DvJevZtuyzfXfrPttaZnAT8GPhCRNwbEZ+LiKPo41hl5uPA3wKPAk/S7PsW+j9W4+ZjbKbaRjveRXPU20lNnbwmJxURbwEez8z7Jizq5zi9EvidiLgzIv4jIn6jw5q6Nk4zOZiDPiZpy65vJGIh8BXg/Zn50w7qmW37dLWcB+zMzC1tbHdeaioW0Ly9/XRmvh54juYt8FTmY6yOofkyvROBXwGOAs6eZj3zNVYz6XsdEXEFsBfY0IOa2q43Il4CXAF8dLLFXaxpthbQnBY6Dfgz4MaIiC7X1NXf68Ec9G191cJcRMSLaEJ+Q2beXJqfioglZfkSmgs109UzXfuyWdb/JuAtEbED2Ehz+uZaYFFEjH8monU9+7Zdlh8N7Oqg1pmMAWOZeWeZv4km+Ps5Vm8GfpSZP87MXwA3A79F/8dq3HyMzVTbmFJErAbOAy7Oct6gg5qeZvbjPJlfo/lDfV95zS8D7omIX+6gpm6O0xhwczbuonl3fXwHNXVrnGbW6Tmffj9o/qo+QvNCGL/A8eourj+ALwLXTmj/BPtfuPl4mT6X/S8O3VXaj6U5f31MefwIOLYs+07pO35x6JxZ1DfCCxdj/4X9L+i8p0xfxv4XdG4s069m/4tGj9BcMOp4TIH/BF5Vpv+yjFPfxormG1O3AS8pz1lPc6dEX8aKA8/z9nxsptrGNDWdRfN14idM6DfrMZjtOE9V04RlO3jhHH0/x+mPgL8q06+kOcUS8zlOs86zboRivx40V95/SHNF+4our/u3ad4q3Q98tzzOoTlPtpnmdqjNLS+ioPkPVx6muYVruGVd76K5TWo7cElL+zDNrX8PA3/PLC62sH/Qn0RzR8H28sIZvxvgyDK/vSw/qeX5V5TtPkjLHSydjinwOuDuMl7/Wv6R9XWsgI/R3C74PeBL5R/gvI8Vza2dTwK/oDlSu3Q+xmaqbUxT03aa0Bp/vX+m0zHocJwPqGnCOO5g/9sr+zVOhwP/VNZ1D3D6fI5TJw8/GStJlTuYz9FLktpg0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVLn/B2lOK2mnH3WOAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df.favorite_count.hist(bins = 100);" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAFHJJREFUeJzt3X+MXWWdx/H3d6n8EJQWkEm3bbYQG1eyRKwTt64bM1B1KRjLH5BgyFLZbrrZZY2uJFrWPzYm+0fZLP6A3eBORLeYamVRtw2ipincGJMFBUGKVuyIFcbWVgWqA/7Yrt/94z6FSzvtPXPnTmfu0/crmdxznvPcc5/zzelnnjn33NvITCRJ9fqD2R6AJGlmGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekys2b7QEAnHPOObl06dKenvvcc89x+umn93dAlbFGzVin7qxRd8ezRg899NDPM/NV3frNiaBfunQpDz74YE/PbbVajIyM9HdAlbFGzVin7qxRd8ezRhHx4yb9vHQjSZUz6CWpcga9JFXOoJekynUN+oh4TUQ80vHzy4h4X0ScFRHbImJXeVxQ+kdE3BIRYxHxaEQsn/nDkCQdTdegz8zHM/OizLwIeAPwPPAlYD2wPTOXAdvLOsAqYFn5WQfcNhMDlyQ1M9VLNyuBH2bmj4HVwMbSvhG4oiyvBu7ItvuB+RGxsC+jlSRN2VSD/mrgc2V5KDP3ApTHc0v7IuCpjueMlzZJ0ixo/IGpiDgZeCdwY7euk7Qd8R/TRsQ62pd2GBoaotVqNR3KS0xMTPT83BOFNWrGOnVnjbqbizWayidjVwHfzsx9ZX1fRCzMzL3l0sz+0j4OLOl43mJgz+E7y8xRYBRgeHg4e/0k2a2btnDzN54DYPeGy3vaR+38NGMz1qk7a9TdXKzRVC7dvIsXL9sAbAXWlOU1wJaO9mvL3TcrgAOHLvFIko6/RjP6iHg58DbgbzqaNwB3RsRa4EngqtJ+D3AZMEb7Dp3r+jZaSdKUNQr6zHweOPuwtl/Qvgvn8L4JXN+X0UmSps1PxkpS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVrlHQR8T8iLgrIr4fETsj4k0RcVZEbIuIXeVxQekbEXFLRIxFxKMRsXxmD0GSdCxNZ/QfB76amX8MvA7YCawHtmfmMmB7WQdYBSwrP+uA2/o6YknSlHQN+oh4JfAW4HaAzPxdZj4LrAY2lm4bgSvK8mrgjmy7H5gfEQv7PnJJUiNNZvTnAz8DPh0RD0fEJyPidGAoM/cClMdzS/9FwFMdzx8vbZKkWTCvYZ/lwHsy84GI+DgvXqaZTEzSlkd0ilhH+9IOQ0NDtFqtBkM50tBpcMOFBwF63kftJiYmrE0D1qk7a9TdXKxRk6AfB8Yz84GyfhftoN8XEQszc2+5NLO/o/+SjucvBvYcvtPMHAVGAYaHh3NkZKSnA7h10xZu3tE+jN3X9LaP2rVaLXqt74nEOnVnjbqbizXqeukmM38KPBURrylNK4HvAVuBNaVtDbClLG8Fri1336wADhy6xCNJOv6azOgB3gNsioiTgSeA62j/krgzItYCTwJXlb73AJcBY8Dzpa8kaZY0CvrMfAQYnmTTykn6JnD9NMclSeoTPxkrSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVLlGQR8RuyNiR0Q8EhEPlrazImJbROwqjwtKe0TELRExFhGPRsTymTwASdKxTWVGf3FmXpSZw2V9PbA9M5cB28s6wCpgWflZB9zWr8FKkqZuOpduVgMby/JG4IqO9juy7X5gfkQsnMbrSJKmITKze6eIHwHPAAn8R2aORsSzmTm/o88zmbkgIu4GNmTmN0r7duCDmfngYftcR3vGz9DQ0Bs2b97c0wHsf/oA+37dXr5w0Zk97aN2ExMTnHHGGbM9jDnPOnVnjbo7njW6+OKLH+q4ynJU8xru782ZuScizgW2RcT3j9E3Jmk74rdJZo4CowDDw8M5MjLScCgvdeumLdy8o30Yu6/pbR+1a7Va9FrfE4l16s4adTcXa9To0k1m7imP+4EvAW8E9h26JFMe95fu48CSjqcvBvb0a8CSpKnpGvQRcXpEvOLQMvB24DFgK7CmdFsDbCnLW4Fry903K4ADmbm37yOXJDXS5NLNEPCliDjU/7OZ+dWI+BZwZ0SsBZ4Erir97wEuA8aA54Hr+j5qSVJjXYM+M58AXjdJ+y+AlZO0J3B9X0YnSZo2PxkrSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVrun30Q+Epeu//MLy7g2Xz+JIJGnucEYvSZUz6CWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqlzjoI+IkyLi4Yi4u6yfFxEPRMSuiPh8RJxc2k8p62Nl+9KZGbokqYmpzOjfC+zsWL8J+GhmLgOeAdaW9rXAM5n5auCjpZ8kaZY0CvqIWAxcDnyyrAdwCXBX6bIRuKIsry7rlO0rS39J0ixoOqP/GPAB4Pdl/Wzg2cw8WNbHgUVleRHwFEDZfqD0lyTNgq5fahYR7wD2Z+ZDETFyqHmSrtlgW+d+1wHrAIaGhmi1Wk3Ge4Sh0+CGCw8e0d7r/mo0MTFhPRqwTt1Zo+7mYo2afHvlm4F3RsRlwKnAK2nP8OdHxLwya18M7Cn9x4ElwHhEzAPOBJ4+fKeZOQqMAgwPD+fIyEhPB3Drpi3cvOPIw9h9TW/7q1Gr1aLX+p5IrFN31qi7uVijrpduMvPGzFycmUuBq4F7M/Ma4D7gytJtDbClLG8t65Tt92bmETN6SdLxMZ376D8IvD8ixmhfg7+9tN8OnF3a3w+sn94QJUnTMaX/eCQzW0CrLD8BvHGSPr8BrurD2CRJfeAnYyWpcga9JFXOoJekyhn0klQ5g16SKmfQS1LlDHpJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKGfSSVDmDXpIq1zXoI+LUiPhmRHwnIr4bER8u7edFxAMRsSsiPh8RJ5f2U8r6WNm+dGYPQZJ0LE1m9L8FLsnM1wEXAZdGxArgJuCjmbkMeAZYW/qvBZ7JzFcDHy39JEmzpGvQZ9tEWX1Z+UngEuCu0r4RuKIsry7rlO0rIyL6NmJJ0pQ0ukYfESdFxCPAfmAb8EPg2cw8WLqMA4vK8iLgKYCy/QBwdj8HLUlqbl6TTpn5f8BFETEf+BLw2sm6lcfJZu95eENErAPWAQwNDdFqtZoM5QhDp8ENFx48or3X/dVoYmLCejRgnbqzRt3NxRo1CvpDMvPZiGgBK4D5ETGvzNoXA3tKt3FgCTAeEfOAM4GnJ9nXKDAKMDw8nCMjIz0dwK2btnDzjiMPY/c1ve2vRq1Wi17reyKxTt1Zo+7mYo2a3HXzqjKTJyJOA94K7ATuA64s3dYAW8ry1rJO2X5vZh4xo5ckHR9NZvQLgY0RcRLtXwx3ZubdEfE9YHNE/DPwMHB76X878JmIGKM9k796BsYtSWqoa9Bn5qPA6ydpfwJ44yTtvwGu6svoJEnT5idjJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmqnEEvSZUz6CWpcga9JFXOoJekyhn0klS5KX0f/SBZuv7LLyzv3nD5LI5EkmaXM3pJqpxBL0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5boGfUQsiYj7ImJnRHw3It5b2s+KiG0Rsas8LijtERG3RMRYRDwaEctn+iAkSUfXZEZ/ELghM18LrACuj4gLgPXA9sxcBmwv6wCrgGXlZx1wW99HLUlqrGvQZ+bezPx2Wf4VsBNYBKwGNpZuG4EryvJq4I5sux+YHxEL+z5ySVIjkZnNO0csBb4O/AnwZGbO79j2TGYuiIi7gQ2Z+Y3Svh34YGY+eNi+1tGe8TM0NPSGzZs393QA+58+wL5fH7vPhYvO7GnftZiYmOCMM86Y7WHMedapO2vU3fGs0cUXX/xQZg5369f4S80i4gzgC8D7MvOXEXHUrpO0HfHbJDNHgVGA4eHhHBkZaTqUl7h10xZu3nHsw9h9TW/7rkWr1aLX+p5IrFN31qi7uVijRnfdRMTLaIf8psz8Ymned+iSTHncX9rHgSUdT18M7OnPcCVJU9XkrpsAbgd2ZuZHOjZtBdaU5TXAlo72a8vdNyuAA5m5t49jliRNQZNLN28G/hLYERGPlLZ/BDYAd0bEWuBJ4Kqy7R7gMmAMeB64rq8jliRNSdegL2+qHu2C/MpJ+idw/TTHJUnqEz8ZK0mVM+glqXIGvSRVzqCXpMoZ9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVc6gl6TKNf6PRwbZ0vVffmF594bLZ3EkknT8OaOXpMoZ9JJUOYNekipn0EtS5U6IN2M7db4xC745K6l+zuglqXIGvSRVrmvQR8SnImJ/RDzW0XZWRGyLiF3lcUFpj4i4JSLGIuLRiFg+k4OXJHXXZEb/n8Clh7WtB7Zn5jJge1kHWAUsKz/rgNv6M0xJUq+6Bn1mfh14+rDm1cDGsrwRuKKj/Y5sux+YHxEL+zVYSdLU9XqNfigz9wKUx3NL+yLgqY5+46VNkjRL+n17ZUzSlpN2jFhH+/IOQ0NDtFqtnl5w6DS44cKDPT0X6Pl1B8nExMQJcZzTZZ26s0bdzcUa9Rr0+yJiYWbuLZdm9pf2cWBJR7/FwJ7JdpCZo8AowPDwcI6MjPQ0kFs3beHmHb3/vtp9TW+vO0harRa91vdEYp26s0bdzcUa9XrpZiuwpiyvAbZ0tF9b7r5ZARw4dIlHkjQ7uk6FI+JzwAhwTkSMA/8EbADujIi1wJPAVaX7PcBlwBjwPHDdDIxZkjQFXYM+M991lE0rJ+mbwPXTHZQkqX/8ZKwkVc6gl6TKGfSSVLkT7muKD+f/Jyupds7oJalyBr0kVc6gl6TKGfSSVDmDXpIqZ9BLUuUMekmq3Al/H/3ReH+9pFoY9B06w12SauGlG0mqnEEvSZUz6CWpcga9JFXON2Mb8A4cSYPMoJ8iQ1/SoPHSjSRVzhn9NDi7lzQIZmRGHxGXRsTjETEWEetn4jUkSc30fUYfEScB/w68DRgHvhURWzPze/1+rbmkyadqZ2LW718VkrqZiUs3bwTGMvMJgIjYDKwGqg76JvoVyn5Vg6SpmImgXwQ81bE+DvzpDLzOQJuJsD7aL5IdPznAu8u2fs76/WtCGgwzEfQxSVse0SliHbCurE5ExOM9vt45wM97fG614qaXrL5Qo8PaZ+r1BpXnUnfWqLvjWaM/atJpJoJ+HFjSsb4Y2HN4p8wcBUan+2IR8WBmDk93PzWzRs1Yp+6sUXdzsUYzcdfNt4BlEXFeRJwMXA1snYHXkSQ10PcZfWYejIi/B74GnAR8KjO/2+/XkSQ1MyMfmMrMe4B7ZmLfk5j25Z8TgDVqxjp1Z426m3M1iswj3ieVJFXE77qRpMoNdNCfaF+1EBFLIuK+iNgZEd+NiPeW9rMiYltE7CqPC0p7RMQtpT6PRsTyjn2tKf13RcSajvY3RMSO8pxbImKy22XnvIg4KSIejoi7y/p5EfFAOd7PlxsFiIhTyvpY2b60Yx83lvbHI+IvOtoH/ryLiPkRcVdEfL+cT2/yPHqpiPiH8u/ssYj4XEScOrDnUWYO5A/tN3p/CJwPnAx8B7hgtsc1w8e8EFhell8B/AC4APgXYH1pXw/cVJYvA75C+7MNK4AHSvtZwBPlcUFZXlC2fRN4U3nOV4BVs33cPdbq/cBngbvL+p3A1WX5E8DfluW/Az5Rlq8GPl+WLyjn1CnAeeVcO6mW8w7YCPx1WT4ZmO959JL6LAJ+BJzWcf68e1DPo0Ge0b/wVQuZ+Tvg0FctVCsz92bmt8vyr4CdtE/I1bT/4VIeryjLq4E7su1+YH5ELAT+AtiWmU9n5jPANuDSsu2Vmfk/2T5L7+jY18CIiMXA5cAny3oAlwB3lS6H1+hQ7e4CVpb+q4HNmfnbzPwRMEb7nBv48y4iXgm8BbgdIDN/l5nP4nl0uHnAaRExD3g5sJcBPY8GOegn+6qFRbM0luOu/Gn4euABYCgz90L7lwFwbul2tBodq318kvZB8zHgA8Dvy/rZwLOZebCsdx7XC7Uo2w+U/lOt3SA5H/gZ8OlyeeuTEXE6nkcvyMyfAP8KPEk74A8ADzGg59EgB32jr1qoUUScAXwBeF9m/vJYXSdpyx7aB0ZEvAPYn5kPdTZP0jW7bKu2RrRnqsuB2zLz9cBztC/VHM0JV6Py/sRq2pdb/hA4HVg1SdeBOI8GOegbfdVCbSLiZbRDflNmfrE07yt/LlMe95f2o9XoWO2LJ2kfJG8G3hkRu2n/OXwJ7Rn+/PInOLz0uF6oRdl+JvA0U6/dIBkHxjPzgbJ+F+3g9zx60VuBH2XmzzLzf4EvAn/GgJ5Hgxz0J9xXLZRrfrcDOzPzIx2btgKH7nhYA2zpaL+23DWxAjhQ/iT/GvD2iFhQZi5vB75Wtv0qIlaU17q2Y18DITNvzMzFmbmU9jlxb2ZeA9wHXFm6HV6jQ7W7svTP0n51uZviPGAZ7TcYB/68y8yfAk9FxGtK00raXyPuefSiJ4EVEfHycgyHajSY59Fsv7s9nR/adwP8gPa71x+a7fEch+P9c9p/3j0KPFJ+LqN9LXA7sKs8nlX6B+3/BOaHwA5guGNff0X7jaEx4LqO9mHgsfKcf6N8qG4Qf4ARXrzr5nza/8DGgP8CTintp5b1sbL9/I7nf6jU4XE67hqp4bwDLgIeLOfSf9O+a8bz6KU1+jDw/XIcn6F958xAnkd+MlaSKjfIl24kSQ0Y9JJUOYNekipn0EtS5Qx6SaqcQS9JlTPoJalyBr0kVe7/AYphbsahaUjAAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df.retweet_count.hist(bins = 100);" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAD8CAYAAACGsIhGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAE5BJREFUeJzt3X2wXHV9x/H3V8KDYmnAXBATMNBJVXRaYa7Uh2qttJYHa5ip2FBbM5ZOxhlatWol1s4w40MH1IrWaZ1JBY0tIjSiUNFWjDLUGY3eAPIUAwExXBPItYqgCBj99o9zLiyXe/O7u+fs7n14v2bu3N2zZ/f7++3+7n7O75w9eyMzkSRpX5407AZIkuY+w0KSVGRYSJKKDAtJUpFhIUkqMiwkSUWGhSSpyLCQJBUZFpKkoiXDbgDAsmXLcuXKlcNuhiTNK1u3bv1hZo4MotacCIuVK1cyNjY27GZI0rwSEd8fVC13Q0mSigwLSVKRYSFJKjIsJElFhoUkqciwkCQVGRaSpCLDQpJUZFhIkormxBncTaxcf9Wjl+8677QhtkSSFi5nFpKkIsNCklRkWEiSigwLSVKRYSFJKjIsJElFxbCIiIsiYk9E3Nyx7AMR8d2IuDEiPhcRSztue2dE7IiI7RHxR/1quCRpcGYzs/gkcPKUZVcDz8vM3wJuA94JEBHHAWuA59b3+deI2K+11kqShqIYFpl5LfCjKcu+nJl766vfBFbUl1cDn8nMhzPze8AO4MQW2ytJGoI2jln8JfCl+vJy4O6O28brZZKkeaxRWETEu4C9wMWTi6ZZLWe477qIGIuIsYmJiSbNkCT1Wc9hERFrgVcBr8vMyUAYB47qWG0FsGu6+2fmhswczczRkZGRXpshSRqAnsIiIk4GzgFenZkPdtx0JbAmIg6MiGOAVcC3mjdTkjRMxW+djYhLgJcDyyJiHDiX6tNPBwJXRwTANzPzjZl5S0RcBtxKtXvq7Mz8Zb8aL0kajGJYZOaZ0yy+cB/rvw94X5NGSZLmlnn//yw6df5vC/D/W0hSW/y6D0lSkWEhSSoyLCRJRYaFJKnIsJAkFRkWkqQiw0KSVLSgzrOYqvO8C8+5kKTeObOQJBUZFpKkIsNCklRkWEiSigwLSVKRYSFJKjIsJElFhoUkqciwkCQVGRaSpCLDQpJUZFhIkooW9BcJdvJLBSWpd84sJElFhoUkqagYFhFxUUTsiYibO5YdFhFXR8Tt9e9D6+UREf8cETsi4saIOKGfjZckDcZsZhafBE6esmw9sDkzVwGb6+sApwCr6p91wMfaaaYkaZiKYZGZ1wI/mrJ4NbCxvrwROL1j+aey8k1gaUQc2VZjJUnD0esxiyMyczdA/fvwevly4O6O9cbrZZKkeaztA9wxzbKcdsWIdRExFhFjExMTLTdDktSmXsPi3sndS/XvPfXyceCojvVWALume4DM3JCZo5k5OjIy0mMzJEmD0GtYXAmsrS+vBa7oWP76+lNRLwR+Mrm7SpI0fxXP4I6IS4CXA8siYhw4FzgPuCwizgJ2AmfUq38ROBXYATwIvKEPbZYkDVgxLDLzzBluOmmadRM4u2mjJElzi2dwS5KKDAtJUpFhIUkqMiwkSUWGhSSpyLCQJBUZFpKkIsNCklRkWEiSigwLSVKRYSFJKjIsJElFhoUkqciwkCQVGRaSpCLDQpJUZFhIkooMC0lSkWEhSSoyLCRJRYaFJKnIsJAkFRkWkqQiw0KSVNQoLCLibyPiloi4OSIuiYiDIuKYiNgSEbdHxKURcUBbjZUkDUfPYRERy4E3AaOZ+TxgP2ANcD5wQWauAn4MnNVGQyVJw9N0N9QS4MkRsQR4CrAbeAWwqb59I3B6wxqSpCHrOSwy8wfAB4GdVCHxE2ArcF9m7q1XGweWT3f/iFgXEWMRMTYxMdFrMyRJA9BkN9ShwGrgGOAZwMHAKdOsmtPdPzM3ZOZoZo6OjIz02gxJ0gA02Q31B8D3MnMiM38BXA68GFha75YCWAHsathGSdKQNQmLncALI+IpERHAScCtwNeA19TrrAWuaNZESdKwLSmvMr3M3BIRm4DrgL3A9cAG4CrgMxHx3nrZhW00tE0r11/16OW7zjttiC2RpPmh57AAyMxzgXOnLL4TOLHJ40qS5hbP4JYkFRkWkqQiw0KSVGRYSJKKDAtJUpFhIUkqMiwkSUWGhSSpyLCQJBUZFpKkIsNCklRkWEiSigwLSVKRYSFJKjIsJElFhoUkqciwkCQVGRaSpCLDQpJUZFhIkooMC0lSkWEhSSoyLCRJRY3CIiKWRsSmiPhuRGyLiBdFxGERcXVE3F7/PrStxkqShqPpzOIjwH9n5rOB3wa2AeuBzZm5CthcX5ckzWM9h0VEHAK8DLgQIDMfycz7gNXAxnq1jcDpTRspSRquJjOLY4EJ4BMRcX1EfDwiDgaOyMzdAPXvw1topyRpiJqExRLgBOBjmXk88DO62OUUEesiYiwixiYmJho0Q5LUb03CYhwYz8wt9fVNVOFxb0QcCVD/3jPdnTNzQ2aOZuboyMhIg2ZIkvqt57DIzHuAuyPiWfWik4BbgSuBtfWytcAVjVooSRq6JQ3v/zfAxRFxAHAn8AaqALosIs4CdgJnNKwhSRqyRmGRmTcAo9PcdFKTx5UkzS2ewS1JKjIsJElFhoUkqciwkCQVGRaSpCLDQpJUZFhIkooMC0lSkWEhSSoyLCRJRYaFJKnIsJAkFRkWkqSipl9RPu+tXH/VtMvvOu+0AbdEkuYuZxaSpCLDQpJUZFhIkooMC0lSkWEhSSoyLCRJRYaFJKnIsJAkFRkWkqQiw0KSVNQ4LCJiv4i4PiK+UF8/JiK2RMTtEXFpRBzQvJmSpGFqY2bxZmBbx/XzgQsycxXwY+CsFmpIkoaoUVhExArgNODj9fUAXgFsqlfZCJzepIYkafiaziw+DLwD+FV9/WnAfZm5t74+DixvWEOSNGQ9h0VEvArYk5lbOxdPs2rOcP91ETEWEWMTExO9NkOSNABN/p/FS4BXR8SpwEHAIVQzjaURsaSeXawAdk1358zcAGwAGB0dnTZQhsn/cyFJj+l5ZpGZ78zMFZm5ElgDfDUzXwd8DXhNvdpa4IrGrZQkDVU/zrM4B3hrROygOoZxYR9qSJIGqJV/q5qZ1wDX1JfvBE5s43ElSXODZ3BLkopamVksVp0HwT3wLWkhc2YhSSoyLCRJRYaFJKnIsJAkFRkWkqQiw0KSVGRYSJKKPM9igLo9L8PzOCTNFc4sJElFhoUkqciwkCQVecyiSzP9UySPL0hayJxZSJKKDAtJUpFhIUkqMiwkSUWGhSSpyE9DzTGL8dNWC7lv0kLhzEKSVOTMog/cUi6baQYlaW4yLOYA3zglzXXuhpIkFfU8s4iIo4BPAU8HfgVsyMyPRMRhwKXASuAu4LWZ+ePmTV1YnE1Imk+azCz2Am/LzOcALwTOjojjgPXA5sxcBWyur0uS5rGeZxaZuRvYXV9+ICK2AcuB1cDL69U2AtcA5zRqpbQA+MEHzWetHOCOiJXA8cAW4Ig6SMjM3RFx+Az3WQesAzj66KPbaMac5O4mSQtB4wPcEfFU4LPAWzLz/tneLzM3ZOZoZo6OjIw0bYYkqY8azSwiYn+qoLg4My+vF98bEUfWs4ojgT1NG6mZuWtD0iA0+TRUABcC2zLzQx03XQmsBc6rf1/RqIV6gtns2jJEFp9+veaOJUGzmcVLgL8AboqIG+plf08VEpdFxFnATuCMZk2UJA1bk09DfR2IGW4+qdfH1fw0m63PJgf7Z7qvW7rD44xjcfEMbklSkd8NpUcthi3FxdBHqR8MC/XVfD3PZCGFyiB24S2k50vTczeUJKnImYXmNbdopcFwZiFJKnJmoZ7143hEWzOFXj7KO5t6Mz3ufJ/hzPf2q/8MiwVkrhxMnivt6NYg2z2X35ybPg9zuW/qnbuhJElFziw0Z82VGcpM7ZjN8m63rPuxG67JOvPJIHdhTl2v00LaPdnJsFiE+v3VHHPNMPvS1lecLITXaTbHezrN9zfXqeZ7P90NJUkqcmYhadaG9SGATm1tiferL/NpttcNZxaSpCJnFovcQt0Kmk+6fQ0WwmvWVh8G+Y/AFsLz3oRhsQgs9kG+EMzX17DfJ252u858fR7nAndDSZKKnFlowXCrUW1zTD3GmYUkqciwkCQVGRaSpCLDQpJUZFhIkor6FhYRcXJEbI+IHRGxvl91JEn915ewiIj9gH8BTgGOA86MiOP6UUuS1H/9mlmcCOzIzDsz8xHgM8DqPtWSJPVZv8JiOXB3x/XxepkkaR7q1xncMc2yfNwKEeuAdfXVn0bE9n083jLghy21rRvDqLuY+rrY6trXhVm3Uc04v1HtZzW6dxf6FRbjwFEd11cAuzpXyMwNwIbZPFhEjGXmaHvNm51h1F1MfV1sde3rwqw7rL5O1h5UrX7thvo2sCoijomIA4A1wJV9qiVJ6rO+zCwyc29E/DXwP8B+wEWZeUs/akmS+q9v3zqbmV8EvtjSw81qd1UfDKPuYurrYqtrXxdm3WH1daC1IzPLa0mSFjW/7kOSVJaZrf9QfRLqa8A24BbgzfXyw4Crgdvr34fWy58NfAN4GHj7lMe6C7gJuAEY20fNi6g+vvbTzrrAGcB3qT66u7OLukuBTfV9twEvmqHunwMPAo8A93b09R318gSuabnmycAd9ePv6ejrhcDNwAP1z1cH1NdPAz+vH/sB4KUt93V7PQ7u6Hht76jHxE3AQ/XrPqjXdjVwf/3Yu4BlLdZsbRxTfazyho6f+4G39HMcd1mztXHcUl+7Gsct9fV/qcbwA8De+ra2n+PtwA5gfcfyVwDX1c/zRmBJ8X29tEIvP8CRwAn15V8DbqP62o/3TzYYWA+cX18+HHgB8D6mD4tls6j5MuCVVGeOd9Z9FfBvwJ3AaBd1NwJ/VV8+AFg6Tc396vb9cb3OTfX144BPAufX19/dcs07gN+hOlP+O3W/bqsf9/11Hz8EfGlAfb0J+I8uX9du+noscHTHOJo6pm4CXj+g1/a5VH+cH6zX2wx8qY2a/RjHU/pzD/DMfo/jLmq2Oo5b6GvX47iFvj76vgh8Fri0D3WPrfv6nbrek6hOmv7Ner13A2eV3mP7shsqM3dn5nX15QeotpCWU22RbaxX2wicXq+zJzO/DfyiQc1rqZ78h6bUfZjqD/Ce2daNiEPq+1xYr/dIZt43TdkTge2Z+V9Zfa3Jp6m2HJYDLwI+XK93acs1d2Tmlsz8FtVXqfxh3delPPYcPxnYOqC+Hk211QntP7+TXxuzE/gEsHrKmDqdaib7+T7Unq6/zwEOAv6pXu+jwEtbqtnqOJ7iJOCOzPx+F33tehx3WbO1cdxSX7saxy31dfJ9cRPV1v4/9KHu1K9dehrwcGbeVq93NfAn+6gBDOCYRUSsBI4HtgBHZOZuqAKFKjlLEvhyRGytz/ruqS7VlHO2dY8FJoBPRMT1EfHxiDh4mvWmfq3JQ1RvXI/rK9VUt181x6mmq5N9XUk1NX028I99rNvZ1ycDfxcRN1LttuhnX5dPeW2XA1/JzPsH9Np+mepThJNfX/P7VP1vo+YTNBzHndYAl8xwW5vjuNeaTcdxr3WbjONea07t6xHAi4HNmXl7n+sup9rNuX9ETJ5I+BoefxL1tPoaFhHxVKqp1Vsy8/4eH+YlmXkC1TfYnh0RLxtA3SXACcDHMvN44GdU08QnlJpS803AtYOqWTuQ6rmZ7OvPgWdQbbX8aT/qTtPXn1MN/hdQHZc6qO2aU+7b+doewMx/LK3Unqa/PwMuiIhvUe1rLn2kcLY1H9+Adv5+qE+MfTXwnzOtMqVmk3Hcdc1a03Hcdd0WxnHXNWtT+wpwJt2N417qAmRW+57W8PgxvLdUr5//z2J/qoF+cWZeXi++NyKOrG8/kupgzj5l5q769x7gc8CJEXFURNxQ/7xxmrs9oS7Vm8ps644D45m5pb6+CThhmrrjwFEdfb0V+PrUvlJtObRas+7L/lRbQN+e0tfDqXYZ/NmA+noP8PTMfBi4Avhl232tHU01Vb84My+PiKdRjeGt9fMxqNd2F/DazDyR6gDhgy3VnKrpOJ50CnBdZt5b37ef47irmvVtbYzjNvra7Thuq68TVMczrurH69qx/qNfu5SZ38jMl9Zj+FqqDx3tU19OyouIoNo/uy0zP9Rx05XAWuC8+vcVhcc5GHhSZj5QX34l8O7MvBt4/gx3m9wtMbXu5D65Yt3MvCci7o6IZ2Xmdqr9grdOrRsRS4BVVAN6O/B7PPa1JpN9hWrLqNWaEXEM8B7gUOAd9XP+Gx11l1FtwQyir5uBtRFxPvBWqjfQtvv6A+BtwBc6XtszqLY81zDLMdVSf79S9/cC4L0UTj6dbc0pGo/jDo/bau3nOO62ZlvjuNu6tDCOm/a14/7fA+7PzIciovXXtePvZw1V8BIRh2fmnog4EDiH6iD6vmXhCHgvP8DvUk3Nb+Sxj3edSnVgZTNVim0GDqvXfzpVCt4P3FdfPoRqX+936p9bgHfto+YlVPvikmq/7s667nuo0vRXVFOt/yvVrW97PjBW9+Hz1B9nm6bu2+uaDwO7O/q6nmp/6ORt/95izVPr/iXV1tDkc3xr/fPT+vGvGVBfr61rTi4/quW+3kY12KeOqRupAmPWY6ql/n6UalfSI3Xb2qzZ9jh+Sr3urxf+Ztscx7Ot2fY4btrXXsZx076eSjWzuZ7uxnA3dW+j+lTUuzqWf4BqQ2s7M3zsduqPZ3BLkoo8g1uSVGRYSJKKDAtJUpFhIUkqMiwkSUWGhSSpyLCQJBUZFpKkov8HZYfKRlqKGAMAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "timestamp = df.index\n", "\n", "plt.hist(timestamp, bins = 100);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen from the plot above, WeRateDogs took a lot of effort to promote the account, posting quite frequently during the first months. We can see if it paid off with mean retweet and favorite counts per months." ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.groupby([df.index.year, df.index.month]).retweet_count.mean().plot()\n", "plot.set(xlabel = 'Time', ylabel = 'Count', title = 'Mean Retweet Count Per Month');" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.groupby([df.index.year, df.index.month]).favorite_count.mean().plot()\n", "plot.set(xlabel = 'Time', ylabel = 'Count', title = 'Mean Favorite Count Per Month');" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1.0547338422391856, 1.1)" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.rating.mean(), df.rating.median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The median rating is 11/10 and the interquartile range is between 10/10 and 12/10." ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEYBJREFUeJzt3X+sX3V9x/HnWyrKuNoi1TvS1l0W6yKjU+gdMF3mvWK2UhbLElgw6Kjp1sShMbObdlsy9zPWGYYzcZpODNVsXhi60fAjxpTeOeeK0jEpP2KorNELBMIozQroVvbeH99z9XJ72++535/3fHg+kpueH59zzut+c+6r557vjxuZiSSpXC8ZdgBJUn9Z9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCLRt2AICVK1fm2NhYR9s+88wznH766b0N1EdNytukrNCsvE3KCs3K26Ss0F3e/fv3P5mZr247MDOH/rV+/frs1N69ezvedhialLdJWTOblbdJWTOblbdJWTO7ywvcnTU61ls3klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUuCXxEQiS1K2x7bcBsG3dMTZX03Ud2nFpPyItGV7RS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc5Pr5T0AmMLfPLjYj4RsvRPgmwir+glqXAWvSQVzqKXpMJZ9JJUuNpFHxGnRMQ9EXFrNX92RNwVEQ9FxI0RcWq1/GXV/MFq/Vh/okuS6ljMFf0HgAfnzH8MuC4z1wKHgS3V8i3A4cx8HXBdNU6SNCS1ij4iVgOXAp+t5gN4G3BzNWQXcFk1vamap1p/cTVekjQEkZntB0XcDHwUeAXwu8BmYF911U5ErAHuyMxzI+I+YENmzlTrvgtcmJlPztvnVmArwOjo6PqpqamOvoGjR48yMjLS0bbD0KS8TcoKzcq7lLMeeOTIcctGT4PHn6u3/bpVy3ucqJ7Z3IvJOmtYmaG7c2FycnJ/Zo63G9f2DVMR8avAE5m5PyImZhcvMDRrrPvxgsydwE6A8fHxnJiYmD+klunpaTrddhialLdJWaFZeZdy1oXeGLVt3TGuPVDz/ZUHnun42N282Wo296Kyzh73qomOj9utQZwLdR6NtwDviIiNwMuBVwKfAFZExLLMPAasBh6txs8Aa4CZiFgGLAee6nlySVItbe/RZ+bvZ+bqzBwDrgTuzMyrgL3A5dWwq4Fbqund1TzV+juzzv0hSVJfdPM6+g8DH4yIg8CZwPXV8uuBM6vlHwS2dxdRktSNRd3IysxpYLqafhi4YIExPwCu6EE2SVIP+M5YSSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwrUt+oh4eUR8MyK+HRH3R8SfVMvPjoi7IuKhiLgxIk6tlr+smj9YrR/r77cgSTqZOlf0PwTelplvBN4EbIiIi4CPAddl5lrgMLClGr8FOJyZrwOuq8ZJkoakbdFny9Fq9qXVVwJvA26ulu8CLqumN1XzVOsvjojoWWJJ0qJEZrYfFHEKsB94HfAp4OPAvuqqnYhYA9yRmedGxH3AhsycqdZ9F7gwM5+ct8+twFaA0dHR9VNTUx19A0ePHmVkZKSjbYehSXmblBWalXcpZz3wyJHjlo2eBo8/1/9jr1u1vONtZ3N3krWb43arm3NhcnJyf2aOtxu3rM7OMvN54E0RsQL4R+ANCw2r/l3o6v24/00ycyewE2B8fDwnJibqRDnO9PQ0nW47DE3K26Ss0Ky8Sznr5u23Hbds27pjXHugVl105dBVEx1vO5u7k6zdHLdbgzgXFvWqm8x8GpgGLgJWRMTso7kaeLSangHWAFTrlwNP9SKsJGnx6rzq5tXVlTwRcRrwduBBYC9weTXsauCWanp3NU+1/s6sc39IktQXdX6/OQvYVd2nfwlwU2beGhEPAFMR8efAPcD11fjrgS9ExEFaV/JX9iG3JKmmtkWfmfcC5y2w/GHgggWW/wC4oifpJEld852xklQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TC9f+v/UrSEje2wB9Er+vQjkt7mKQ/vKKXpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1Lh2hZ9RKyJiL0R8WBE3B8RH6iWvyoivhoRD1X/nlEtj4j4ZEQcjIh7I+L8fn8TkqQTq3NFfwzYlplvAC4CromIc4DtwJ7MXAvsqeYBLgHWVl9bgU/3PLUkqba2RZ+Zj2Xmv1fT/w08CKwCNgG7qmG7gMuq6U3A57NlH7AiIs7qeXJJUi2LukcfEWPAecBdwGhmPgat/wyA11TDVgHfn7PZTLVMkjQEkZn1BkaMAP8M/EVmfjkins7MFXPWH87MMyLiNuCjmfn1avke4EOZuX/e/rbSurXD6Ojo+qmpqY6+gaNHjzIyMtLRtsPQpLxNygrNyruUsx545Mhxy0ZPg8ef6/+x161a3vG2s7kHlXVWN5mhu3NhcnJyf2aOtxu3rM7OIuKlwJeAv8vML1eLH4+IszLzserWzBPV8hlgzZzNVwOPzt9nZu4EdgKMj4/nxMREnSjHmZ6eptNth6FJeZuUFZqVdyln3bz9tuOWbVt3jGsP1KqLrhy6aqLjbWdzDyrrrG4yw2DOhTqvugngeuDBzPyrOat2A1dX01cDt8xZ/hvVq28uAo7M3uKRJA1enf/23gK8GzgQEf9RLfsDYAdwU0RsAb4HXFGtux3YCBwEngXe09PEkqRFaVv01b32OMHqixcYn8A1XeaSJPWI74yVpMJZ9JJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwg3uL+hKWpSxBf5Id12HdlzawyRqOq/oJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCmfRS1LhLHpJKpxFL0mFs+glqXAWvSQVzqKXpMK1LfqI+FxEPBER981Z9qqI+GpEPFT9e0a1PCLikxFxMCLujYjz+xlektRenSv6G4AN85ZtB/Zk5lpgTzUPcAmwtvraCny6NzElSZ1qW/SZ+TXgqXmLNwG7quldwGVzln8+W/YBKyLirF6FlSQtXqf36Ecz8zGA6t/XVMtXAd+fM26mWiZJGpLIzPaDIsaAWzPz3Gr+6cxcMWf94cw8IyJuAz6amV+vlu8BPpSZ+xfY51Zat3cYHR1dPzU11dE3cPToUUZGRjradhialLdJWaFZeetkPfDIkY73v27V8o63Xei4o6fB4891vMvaepF7UFlndZMZujtvJycn92fmeLtxyzraOzweEWdl5mPVrZknquUzwJo541YDjy60g8zcCewEGB8fz4mJiY6CTE9P0+m2w9CkvE3KCs3KWyfr5u23dbz/Q1edfN+LPe62dce49kCndVFfL3IPKuusbjLDYM7bTh+N3cDVwI7q31vmLH9fREwBFwJHZm/xSC9GYyco623rjnVV5NJitC36iPgiMAGsjIgZ4CO0Cv6miNgCfA+4ohp+O7AROAg8C7ynD5klSYvQtugz850nWHXxAmMTuKbbUJKk3vGdsZJUOItekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TCWfSSVDiLXpIKZ9FLUuEsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klS4wf0FXamBTvQ3X6Um8Ypekgpn0UtS4Sx6SSqcRS9JhbPoJalwFr0kFc6il6TC+Tp6SepCt++1uGHD6T1KcmIWvQam328+2rbuGJsXOMahHZf29bjSUuetG0kqnEUvSYWz6CWpcBa9JBXOopekwvmqmwbq5tUrvgJFevHxil6SCucV/YvMYn4bWOh16f5GIDWPV/SSVDiLXpIK15dbNxGxAfhr4BTgs5m5ox/HGbZOnhSdvR3iLRBJg9LzK/qIOAX4FHAJcA7wzog4p9fHkSTV048r+guAg5n5MEBETAGbgAf6cKyu9fuDtiRp2PpR9KuA78+ZnwEu7MNxADjwyJEFP7FQejFr6gVMU3MvdZGZvd1hxBXAr2Tmb1bz7wYuyMz3zxu3Fdhazf4M8J0OD7kSeLLDbYehSXmblBWalbdJWaFZeZuUFbrL+1OZ+ep2g/pxRT8DrJkzvxp4dP6gzNwJ7Oz2YBFxd2aOd7ufQWlS3iZlhWblbVJWaFbeJmWFweTtx8srvwWsjYizI+JU4Epgdx+OI0mqoedX9Jl5LCLeB3yF1ssrP5eZ9/f6OJKkevryOvrMvB24vR/7XkDXt38GrEl5m5QVmpW3SVmhWXmblBUGkLfnT8ZKkpYWPwJBkgrXmKKPiA0R8Z2IOBgR2xdY/7KIuLFaf1dEjA0+5Y+ytMv6wYh4ICLujYg9EfFTw8g5J89J884Zd3lEZEQM7RUNdbJGxK9Xj+/9EfH3g844L0u7c+G1EbE3Iu6pzoeNw8hZZflcRDwREfedYH1ExCer7+XeiDh/0BnnZGmX9aoq470R8Y2IeOOgM87Lc9K8c8b9fEQ8HxGX9zRAZi75L1pP6n4X+GngVODbwDnzxvw28Jlq+krgxiWcdRL4iWr6vcPKWjdvNe4VwNeAfcD4Us0KrAXuAc6o5l+zlB9bWvdn31tNnwMcGmLeXwLOB+47wfqNwB1AABcBdy3hrG+ecw5cMsysdfLOOV/upPX85uW9PH5Truh/9LEKmfk/wOzHKsy1CdhVTd8MXBwRMcCMs9pmzcy9mflsNbuP1nsNhqXOYwvwZ8BfAj8YZLh56mT9LeBTmXkYIDOfGHDGuerkTeCV1fRyFnjPyaBk5teAp04yZBPw+WzZB6yIiLMGk+6F2mXNzG/MngMM/2eszmML8H7gS0DPz9mmFP1CH6uw6kRjMvMYcAQ4cyDpTpCjslDWubbQukoalrZ5I+I8YE1m3jrIYAuo89i+Hnh9RPxrROyrPkl1WOrk/WPgXRExQ+tK7v0sXYs9t5eKYf+MtRURq4BfAz7Tj/035S9MLXRlPv/lQnXGDELtHBHxLmAceGtfE53cSfNGxEuA64DNgwp0EnUe22W0bt9M0LqK+5eIODczn+5ztoXUyftO4IbMvDYifgH4QpX3//ofb9GWys9YbRExSavof3HYWdr4BPDhzHy+HzcimlL0dT5WYXbMTEQso/VrcLtflfqh1kdARMTbgT8E3pqZPxxQtoW0y/sK4FxgujoBfxLYHRHvyMy7B5aype55sC8z/xf4z4j4Dq3i/9ZgIh6XpV3eLcAGgMz8t4h4Oa3PPhnmLacTqXVuLxUR8XPAZ4FLMvO/hp2njXFgqvoZWwlsjIhjmflPPdn7MJ+gWMQTGcuAh4Gz+fGTWj87b8w1vPDJ2JuWcNbzaD1Jt7YJj+288dMM78nYOo/tBmBXNb2S1q2GM5dw3juAzdX0G2gVZwzxfBjjxE9wXsoLn4z95rBy1sj6WuAg8OZhZqybd964G+jxk7GNuKLPE3ysQkT8KXB3Zu4Grqf1a+9BWlfyVy7hrB8HRoB/qP4H/15mvmMJ510Samb9CvDLEfEA8Dzwezmkq7maebcBfxsRv0PrNsjmrH7aBy0ivkjrltfK6jmDjwAvBcjMz9B6DmEjrQJ9FnjPMHJCrax/ROs5ur+pfsaO5RA/6KxG3v4ef0jnlCRpQJryqhtJUocsekkqnEUvSYWz6CWpcBa9JBXOopekwln0klQ4i16SCvf/49I66jIWIC8AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df.rating.hist(bins = 20);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But it seems that a dog doesn't need to have the highest possible rating to be most popular - the highest favorite and retweet counts are in 13/10 group (see the plots below). Maybe, 14/10 is too subjective?" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZsAAAEWCAYAAACwtjr+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3XmcXHWZ7/HPt9dskDRJREgnJE7CjKAhQqtgZhzcAzoJDsuAYqKDl3sZdLwul0XvAOM2gqN4cWGGAQRUwBAckqsw3CgwuJBAR7IQBOlhSToJSegsZO308tw/zq9IdadO96muOrV0nvfrFVL11KlTTzWdeuq3nN9PZoZzzjmXpppyJ+Ccc27482LjnHMudV5snHPOpc6LjXPOudR5sXHOOZc6LzbOOedS58XGuQJJOl3S2nLnUS6Sbpb0xXLn4Sqb/DobV80kvQgcDfRkhY83s43lyQgktQMXmtkj5cohjqTpwHPAHsCAV4AfmNk3Ez7/k0Tv7fTUknTDUl25E3CuCP7KzH5ZjheWVGdm3eV47UKY2RgASW8HHpbUamYPlzktN4x5N5obliTVSFok6WVJOyQ9IumN4bE/l7RBUk3W8edK+n24PULSDZI2heO+LakhPPZeSS9K+qKkl4F/y8TC43cBxwIPSNot6XMhPlvSspDLSknvzHrti8I5d0l6XtL5Od7PZEl7JY3Nir1V0hZJdZKOl/SopJ2SXpF0Z5Kfk5ktB54BZmWd93+HPHZJWitpboi/Gfge8Bfhvb0S4j+WdE2/n89lkrZK2ihpfta5J0r6haRXJT0u6euSHkmSq6tuXmzccPZzYAbweuAp4Ech/lugC/jLrGM/AmQ+oK8CWoCZwFuA2cCVWcc2A2OAKcDfZb+gmV0AbATOMLMxZvZtSZOBJcDVwFHAFcDPJI2XdCTwbeB9ZnZEeK3V/d+Ima0HWoG/7pfzwtCy+hrwC6Ap5Pf9wX44iswG3gi0ZT30x5DH2HDeOyUdbWZrgE8Bvw7vbULMqZuBkURF938AN4b3CXAjsIOo6/NvgQWD5emGBy82bji4L7QYdki6D8DMes3sNjPbZWb7gWuAUySNtmig8m7gAgBJ44APhBjAR4FrzGyrmW0Bvgx8LOv1usPjB8xsX4L85gNLzOzBkNd/AKuAOeFxA94kaYSZbTKzp2POc2dWzjXA33CwQHYBU4FjzGy/mf12oIQk7QD2Ar8BbiAqzFEyZgtDHr1mdifwIlHxTWo/8FUz6zKzJUAncLykeuAs4Coz22dm2V8A3DDnxcYNB2eZ2bjw5ywASbWSrgvdQa9y8Jt75tv4ncDZ4QPwbGC5mbWHx44BXso6/0vApKz7m83sQB75HQdckFUQdwCnAsea2atEBeRS4GVJP5d0fMx57iHqwjoaeBew38x+Fx77PFAPtEpaI2nAFoOZjSNqnV0OnE7W+K2kj0talZXrn3Hw55bEK2aWPWFjb3ito4FaYH3WY9m33TDmxcYNV/OBM4F3E3UHTQ9xAZjZamATUYsmuwuNED8u6/4UYEPW/cGmcPZ/fD3ww6yCOM7MRmdmgJnZA2b2XqIi1wb8a86TmnUADwHnhpzvynpsk5l90syOISpcN0maNmCSZj1mdl3I978DSHoDUVfXJcD4UJSeIfzcErz3gWwGeom62TImF3A+V0W82Ljh6gii7psOYBTR2EN/dwGfBU4DFvWLXyVpgqSJwD8AP87jtTcDb8i6/yPgw5LeF1pcIyS9S9Kxko6R9FeSRgEHiKYk9+Q6aXAn0TjHX5NVICWdJynT+tpBVBQGOk+2bwBXSGokaoEYsDU6rT5J1LLJfm/NoUWYFzPrAu4D/lHSSEknAhfmex5XnbzYuOHqh0QD9RuBtcDvchxzJ1HLZ6mZbc+K/yPRmMoaosH65cA/5fHaXyf6QN0h6X+a2YvAh4mK1lZgHVG3Vw1Rt9L/ImpNdQDvIBqEj3MfcAKwzsyyLyR9O/CEpD3Az4BLzWxdwnyXALuBvw0tvhuAx0NOf0b0/jOWEl2nsznMxsvXJcB4oqL1Q6LC3jmE87gq4xd1OufKRtK3gHFmdlG5c3Hp8paNc65kJJ0g6c1h2vWpwCeAfy93Xi59voKAc66UjgR+QjQZYjPwDTP7+cBPccOBd6M555xLnXejOeecS513owUTJkywqVOnljsN55yrKitWrHjFzCYOdpwXm2Dq1Km0traWOw3nnKsqkl4a/CjvRnPOOVcCXmycc86lzouNc8651Hmxcc45lzovNs4551LnxcY55ypcx+5OVq3fQcfu6l2z1Kc+O+dcBVu8cgOX37ua+poaunp7ue7smcydNWnwJ1YYb9k451yF6tjdyeX3rmZ/Vy+7OrvZ39XLZfeursoWjhcb55yrUO3b91Ff0/djur6mhvbt+8qU0dClVmwk3Sppi6Sn+sU/LelZSWslXZcVv1JSW3jsA1nxOSHWJumKrPg0ScslPSfpp5IaQrwx3G8Lj09N6z0651yamptG0tXb2yfW1dtLc9PIMmU0dGm2bG4D5mQHJL0LmAfMNLMTgX8O8ROA84ETw3N+ELbPrQW+D5xBtDvhBeFYgGuB681sBrAdyGy+dBGw3cymA9eH45xzruqMH9PIdWfPZER9DUc01jGivobrzp7J+DGN5U4tb6lNEDCzR3O0Ki4h2r+iMxyzJcTnAXeH+AuS2oC3hcfazOx5AEl3A/Mk/YFoO9+PhGNuB64BbgznuibEFwHfkyTzvRScc1Vo7qxJzJ4+gfbt+2huGlmVhQZKP2ZzPPAXoXvrPyW9NcQnAeuzjmsPsbj4eGCHmXX3i/c5V3h8ZzjeOeeq0vgxjZw0eVzVFhoo/dTnOqAJOBV4K7BQ0hsA5TjWyF0MbYDjGeSxPiRdDFwMMGXKlAETd845N3Slbtm0Az+zyONALzAhxCdnHdcMbBwg/gowTlJdvzjZzwmPjwW25UrGzG4ysxYza5k4cdDtGJxzzg1RqYvNfURjLUg6HmggKhxLgPPDTLJpwAzgceAJYEaYedZANIlgSRh/eRg4J5x3AbA43F4S7hMef8jHa5xzrrxS60aTdBdwOjBBUjtwNXArcGuYDn0AWBAKwVpJC4GngW7gUjPrCef5FPAgUAvcamZrw0tcDtwt6avAk8AtIX4L8KMwyWAbUYFyzjlXRvIv/ZGWlhbznTqdcy4/klaYWctgx/kKAs4551LnxcY551zqvNg455xLnRcb55xzqfNi45xzLnVebJxzzqXOi41zzrnUebFxzjmXOi82zjnnUufFxjnnXOq82DjnnEudFxvnnHOp82LjnHMudV5snHPOpc6LjXPOudR5sXHOOZe61IqNpFslbQm7cvZ/7AuSTNKEcF+SbpDUJmm1pJOzjl0g6bnwZ0FW/BRJa8JzbpCkED9K0tJw/FJJTWm9R+ecK4WO3Z2sWr+Djt2d5U5lyNJs2dwGzOkflDQZeB+wLit8BjAj/LkYuDEcexTRdtJvB94GXJ1VPG4Mx2ael3mtK4BfmdkM4FfhvnPOVaXFKzcw+9qHuPDm5cy+9iGWrNxQ7pSGJLViY2aPAttyPHQ9cBmQvR/1POAOiywDxkk6BvgAsNTMtpnZdmApMCc8dqSZPWbRvtZ3AGdlnev2cPv2rLhzzlWVjt2dXH7vavZ39bKrs5v9Xb1cdu/qqmzhlHTMRtJcYIOZrer30CRgfdb99hAbKN6eIw5wtJltAgh/v26AfC6W1CqpdevWrUN4R845l5727fuor+n7MV1fU0P79n1lymjoSlZsJI0CvgRclevhHDEbQjwvZnaTmbWYWcvEiRPzfbpzzqWquWkkXb29fWJdvb00N40sU0ZDV8qWzZ8A04BVkl4EmoHfS3o9UctkctaxzcDGQeLNOeIAm0M3G+HvLUV/J845VwLjxzRy3dkzGVFfwxGNdYyor+G6s2cyfkxjuVPLW12pXsjM1pDVpRUKTouZvSJpCfApSXcTTQbYaWabJD0IfD1rUsD7gSvNbJukXZJOBZYD84HvhmOWAAuAb4S/F5fg7TnnXCrmzprE7OkTaN++j+amkVVZaCDFYiPpLuB0YIKkduBqM7sl5vD7gTOBNmAv8AmAUFS+AjwRjvuymWUmHVxCNONtJPBA+ANRkVko6SKiGW/nFvFtOedcyY0f01i1RSZD0WQu19LSYq2treVOwznnqoqkFWbWMthxvoKAc8651Hmxcc45lzovNs4551LnxcY551zqvNg455xLnRcb55xzqfNi45xzLnVebJxzrsINh/1sSrZcjXPOufwtXrmBy+9dTX1NDV29vVx39kzmzpo0+BMrjLdsnHOuQvl+Ns4551Ln+9k455xLne9n45xzLnW+n41zzrmSmDtrEicccyQr1+9g1uRxTD/6iHKnNCRebJxzroItXrmByxatolY19Fgv3zznJJ+Nlk3SrZK2SHoqK/ZNSc9IWi3p3yWNy3rsSkltkp6V9IGs+JwQa5N0RVZ8mqTlkp6T9FNJDSHeGO63hcenpvUenXMuTR27O/n8wpV0dht7u3ro7DY+t3Clz0br5zZgTr/YUuBNZjYT+CNwJYCkE4DzgRPDc34gqVZSLfB94AzgBOCCcCzAtcD1ZjYD2A5cFOIXAdvNbDpwfTjOOeeqztqNr9Ldd34A3b1RvNqkVmzM7FFgW7/Y/zOz7nB3GdAcbs8D7jazTjN7gWh76LeFP21m9ryZHQDuBuZJEvBuYFF4/u3AWVnnuj3cXgS8JxzvnHNVJm4n5erbYbmcs9H+Fngg3J4ErM96rD3E4uLjgR1ZhSsT73Ou8PjOcPwhJF0sqVVS69atWwt+Q845V0wnHjuW+tq+35Xra8WJx44tU0ZDV5ZiI+lLQDfwk0wox2E2hPhA5zo0aHaTmbWYWcvEiRMHTto550ps/JhGvnXuSTTW1TCqoZbGuhq+de5JPvU5CUkLgA8B7zGzTBFoByZnHdYMbAy3c8VfAcZJqgutl+zjM+dql1QHjKVfd55zzlWLubMmMXv6BNq376O5aWRVFhoocctG0hzgcmCume3NemgJcH6YSTYNmAE8DjwBzAgzzxqIJhEsCUXqYeCc8PwFwOKscy0It88BHsoqas45V3XGj2nkpMnjqrbQQIotG0l3AacDEyS1A1cTzT5rBJaGMftlZvY/zGytpIXA00Tda5eaWU84z6eAB4Fa4FYzWxte4nLgbklfBZ4EbgnxW4AfSWojatGcn9Z7dM45l4z8S3+kpaXFWltby52Gc85VFUkrzKxlsON8bTTnnHOp82LjnHMudV5snHPOpc6LjXPOudR5sXHOOZc6LzbOOedS58XGOedc6rzYOOecS50XG+ecc6nzYuOccy51Xmycc86lzouNc85VuI7dnaxav4OO3Z3lTmXISr6fjXPOueQWr9zA5feupr6mhq7eXq47eyZzZ00a/IkVxls2zjlXoTp2d3L5vavZ39XLrs5u9nf1ctm9q6uyhePFxjnnKlT79n3U1/T9mK6vqaF9+74yZTR0qRUbSbdK2iLpqazYUZKWSnou/N0U4pJ0g6Q2SaslnZz1nAXh+OfCltKZ+CmS1oTn3KCwG1vcazjnXLVpbhpJV29vn1hXby/NTSPLlNHQpdmyuQ2Y0y92BfArM5sB/CrcBziDaCvoGcDFwI0QFQ6iHT7fDrwNuDqreNwYjs08b84gr+Gcc1Vl/JhGrjt7JiPqaziisY4R9TVcd/bMqtweOrUJAmb2qKSp/cLziLaKBrgdeIRoe+d5wB0WbRu6TNI4SceEY5ea2TYASUuBOZIeAY40s8dC/A7gLOCBAV7DOeeqztxZk5g9fQLt2/fR3DSyKgsNlH422tFmtgnAzDZJel2ITwLWZx3XHmIDxdtzxAd6jUNIupiodcSUKVOG+p6ccy5V48c0Vm2RyaiUCQLKEbMhxPNiZjeZWYuZtUycODHfpzvnnEsoUbGR9CeSGsPt0yX9vaRxQ3i9zaF7jPD3lhBvByZnHdcMbBwk3pwjPtBrOOecK5OkLZt7gR5J04FbgGnAnUN4vSVAZkbZAmBxVnx+mJV2KrAzdIU9CLxfUlOYGPB+4MHw2C5Jp4ZZaPP7nSvXazjnnCuTpGM2vWbWLenDwHfM7LuSnhzoCZLuIhqonyCpnWhW2TeAhZIuAtYB54bD7wfOBNqAvcAnAMxsm6SvAE+E476cmSwAXEI0420k0cSAB0I87jWcc86ViaIJYIMcJC0HvgN8CfgrM3tB0lNm9qa0EyyVlpYWa21tLXcazjlXVSStMLOWwY5L2o32CeA04Guh0EwDflxIgs455w4fibrRzOxpSZcDU8L9F4i6q5xzzlWhjt2dJb12J1GxkfRXwD8DDcA0SbOIxk/mppmcc8654ivHStJJu9GuIVouZgeAma0kmpHmnHMuqIZ9Z8q1knTS2WjdZrYzrHWZkfdFlM45N1xVy74zmZWk93Nwgc/MStJpdqclbdk8JekjQK2kGZK+C/wutaycc66KVNO+M+VaSTppsfk0cCLQSXQx507gM2kl5Zxz1aSa9p0p10rSSbvRPmhmXyK6zgYASecC96SSlXPOVZFq23emHCtJJ23ZXJkw5pxzh51q3Hdm/JhGTpo8rmQ5DtiykXQG0TIykyTdkPXQkUB3mok551w1GS77zqRlsG60jUArMBdYkRXfBXw2raScc64aDYd9Z9IyYLExs1XAKkk/MTNvyTjnnBuSwbrRFprZecCTkg65rsbMZqaWmXPOuWFjsG60zPTmD6WdiHPOVbu01hsr9TpmaRisG22TpFrgFjN7b4lycs65qpPWCgLVsjLBYAad+mxmPcBeSWOL9aKSPitpraSnJN0laYSkaZKWS3pO0k8lNYRjG8P9tvD41KzzXBniz0r6QFZ8Toi1SbqiWHk751wuaa0gUE0rEwwm6XU2+4E1km6RdEPmz1BeUNIk4O+BlrD5Wi1wPnAtcL2ZzQC2AxeFp1wEbDez6cD14TgknRCedyIwB/iBpNrQEvs+cAZwAnBBONY551KR1goC7dv3Yb19h8ut1ypyZYLBJF1B4BfhTzFfd6SkLmAUsAl4N/CR8PjtRCtN3wjMC7cBFgHfU7Qi6DzgbjPrBF6Q1Ea0MjVAm5k9DyDp7nDs00XM3znnXpPWCgKjG2rp7OlbbDp7jNENtQWdtxwStWzM7HbgLqJrbVYAd4ZY3sxsA9HeOOuIiszOcM4dWdOr24FMp+QkYH14bnc4fnx2vN9z4uKHkHSxpFZJrVu3bh3K23HOudRWENhzoIcR9X0/pkfU17DnQE9B5y2HpJunnU7U2ngREDBZ0gIzezTfF5TURNTSmEa0P849RF1e/WXKuWIei4vnKqA5t0Mws5uAmwBaWlp8ywTnKlC1zMRKYwWB5qaR9PTrRuvptYpdc20gSbvRvgW838yeBZB0PFFL55QhvOZ7gRfMbGs418+AdwDjJNWF1ksz0eoFELVMJgPtkuqAscC2rHhG9nPi4s65KjJcZmIVwswGvF8tkk4QqM8UGgAz+yNQP8TXXAecKmlUGHt5D9F4ysPAOeGYBcDicHtJuE94/CGLftpLgPPDbLVpwAzgceAJYEaY3dZANIlgyRBzdc6VSbXNxFq8cgOzr32IC29ezuxrH2LJyg0Fn7N9+z5G1vdtE4ysrxvWEwRaJd0C/Cjc/yh910pLzMyWS1oE/J5oMc8nibqyfgHcLemrIXZLeMotwI/CBIBtRMUDM1sraSFRoeoGLg3TtJH0KeBBoplut5rZ2qHk6pwrn3LtKDkU2YUxk+9l965m9vQJBeVabVsXDCRpsbkEuJRoyrKAR4EfDPVFzexq4Op+4ec5OJss+9j9wLkx5/ka8LUc8fuB+4ean3Ou/Krpgzatwjh+TCPnndLMHcvWvRY7r6W54optEkm70c4Evm9mf21mHzaz68OUY+ecS0U17RGTVmHs2N3JwhXtfWILW9srtitxIElbNnOB70h6FLgbeNBXgXbOpa1a9ojJFMbL+k1mKDTfaupKHEyiYmNmn5BUTzRF+SNEV+svNbNPppqdc+6wVy17xKQ19blauhIHk7QbDTPrAh4gatmsILpWxjnnXFDsrZYzLabGOjGqvpbGOlVsV+JgEhWbsLDlbUAb0fTjm4FjUszLOeccmSvSFS5jz3Ute3VIOmbzcaIWzX/3iQHOOVcamSnVnd0Hu9KKMaW6HJKO2ZyfdiLOOef6Gk4TBJJ2o50q6QlJuyUdkNQj6dW0k3POucNZc9NI9nf3XXRzf3fPsJ4g8D3gAuA5YCTwSeC7aSXlnHMucritjYaZtQG1ZtZjZj8E3pVeWs455w7HtdH2hkUtV0q6jmgfmtHppeWccy7N62xKvXVD0pbNx8KxnwL2EC3hf3ZaSTnnnMu+zqaGUQ21NNYVZ8meNFaoHsyALRtJU8xsnZm9FEL7gX9MPSvnnHNA5jobAxMx+0DmJa0VqgczWMvmvswNSfemloVzzrlDHLzOxtjb1UNntxW8p09mOnW2zHTqNA1WbLIvV31Dmok455zrK43CUK711gYrNhZzuyCSxklaJOkZSX+QdJqkoyQtlfRc+LspHCtJN0hqk7Ra0slZ51kQjn9O0oKs+CmS1oTn3BB2BHXOuaqSRmEo19YNGmjOtqQeogkBIrq+Zm/mIcDM7Mghvah0O/BrM7s5zHIbBXwR2GZm35B0BdBkZpdLOhP4NNGeOm8H/o+ZvV3SUUAr0EJUCFcAp5jZdkmPA58BlhFtonaDmT0wUE4tLS3W2to6lLfjnHOpWbJywyFbF8ydNang8xZrNpqkFWbWMthxA04QMLPaIWcQQ9KRwDuJ1lvDzA4AByTNA04Ph90OPAJcTrS69B0WVcVloVV0TDh2qZltC+ddCsyR9AhwpJk9FuJ3AGcRrVjtnHNVJa09fUq9dUPS62yK6Q3AVuCHkk4iapF8BjjazDYBmNkmSa8Lx08C1mc9vz3EBoq354gfQtLFwMUAU6ZMKexdOedcSqplT5+BJF5BoIjqgJOBG83sLUTddFcMcHyu8RYbQvzQoNlNZtZiZi0TJ04cOGvnnHNDVo5i0w60m9nycH8RUfHZHLrHCH9vyTp+ctbzm4GNg8Sbc8Sdc86VScmLjZm9DKyX9Kch9B7gaWAJkJlRtgBYHG4vAeaHWWmnAjtDd9uDwPslNYWZa+8HHgyP7QorVQuYn3Uu55yrOh27O1m1fkdB19eUWznGbCCaXfaTMBPteeATRIVvoaSLgHXAueHY+4lmorURzYb7BICZbZP0FeCJcNyXM5MFgEuA24hm0D2ATw5wzlWpxSs3cNmiVdSqhh7r5ZvnnFRRs9GSGnDq8+HEpz475ypNx+5O3v71X5K1USd1NbD8i+8tqEAsXrmBy4s0nTrp1OdyjNk455xLYO3GV/sUGoDu3ig+VNlro+3q7GZ/V2/BS+Ak4cXGOVfRqmm8ovi5xvU8Db1Hqlxro5VrzMY55wZVzO6etKUxtnLs2NzL0sTFk6jUtdGcc64sytXdMxQduzv5/MKVfVZn/tzClQXnuudADyPq+35Mj6ivYc+BniGfs1xro3nLxjlXkTLdPZk9V+Bgd0+lXU0/0NjKO48f+gXjca2NQlshaS2BMxBv2TjnKlK5unuGpvhjKxC1Qs5rae4TO6+luSjFYfyYRk6aPK5khduLjXOuIpWru2coTjx2LPW1fVfKqq8VJx47tqDzduzuZGFre5/Ywtb2onQltm3exaLW9bRt3lXwuZLwbjTnXMUqR3fPUIwf08gFb53MHcvWvRa74G2TC843ra7Eq+5b0yfX+adN4cvz3lxQroPxlo1zrqKVurtnKDp2d7JwRfFbIM1NI9lzoLtPbM+B7oK6Ets27+pTaADueGxd6i0cLzbOOVegtK5d2b7nAL39hn16LYoP1cr1O/KKF4sXG+ecK1BakxkeXPtyXvEkF5XOmjwur3ix+JiNc84VKDNr7I7HDnZPFWfWWPJZbkkvgG0a3YD6nUEhniZv2TjnXIHSmjX2gROPSRTP5wLY9u37GNPYt50xprEu9eVqvNg451yByrXe2FBe35ercc65HKphIc60PsCTDubn8/qZ65ca68So+loa61SS65fKVmwk1Up6UtLPw/1pkpZLek7ST8PGakhqDPfbwuNTs85xZYg/K+kDWfE5IdYm6YpSvzfnXHEsXrmB2dc+xIU3L2f2tQ+xZOWGcqeUU1oXoE4dPypRPN/Xj8ZrFA3WoJzHFFs5Jwh8BvgDcGS4fy1wvZndLelfgIuAG8Pf281suqTzw3F/I+kE4HzgROBY4JeSjg/n+j7wPqAdeELSEjN7ulRvzDlXuOxxiMxFjZfdu5rZ0ydU5DU3c2dN4tixI3j0uVd454wJtEwbX/A593blXnAzVzzpBbCZn2tn1mJupfi5lqVlI6kZ+CBwc7gv4N3AonDI7cBZ4fa8cJ/w+HvC8fOAu82s08xeINo2+m3hT5uZPW9mB4C7w7HOuSpS7nGQfF113xrO+ddl3PBQG+f86zKuWrymCGeNa3Xkjie5ALZcP9dydaN9B7gMXluDYTyww8wyl8q2A5k5e5OA9QDh8Z3h+Nfi/Z4TF3fOVZE0rp5PS1pX5R87dkRe8SSam0ayr6vvz3VfV/o/15IXG0kfAraY2YrscI5DbZDH8o3nyuViSa2SWrdu3TpA1s65Ukvj6vmMYk86SOuq/Lh9awrZzwYg6hyKv5+GcozZzAbmSjoTGEE0ZvMdYJykutB6aQY2huPbgclAu6Q6YCywLSuekf2cuHgfZnYTcBNAS0tLYWuBO+eKaqAP8OlHHzHk86ax+2daV+W/sDV3y+iFrbs4aYjnbt++jxF1tXT1HGzdjKirTX2foJK3bMzsSjNrNrOpRAP8D5nZR4GHgXPCYQuAxeH2knCf8PhDZmYhfn6YrTYNmAE8DjwBzAiz2xrCaywpwVtzzhVRGh/gae3+2TS6gdqavq2D2hoVfFX+I398Ja94EuXqnqyk62wuBz4nqY1oTOaWEL8FGB/inwOuADCztcBC4GngP4BLzawntIw+BTxINNttYTjWOVdFph99BPNPm9InNv+0KQW1atIaHG/fvi/nfjaFnre+Jnf3Vlw8iTS7JweGB5YsAAAWb0lEQVRS1rXRzOwR4JFw+3mimWT9j9kPnBvz/K8BX8sRvx+4v4ipOufK4Mvz3sz8U6eycv0OZk0eV1ChgfQGx0c31LK/q+9Flfu7ehndUFvQeTv25G5xxcWTSKt7cjC+EKdzrqJNP/qIon4IRoPh1u9+YfYc6KGuBrIuXaGupvCB/LEj6/OKJ5H0QtFiq6RuNOecS1X79n1EQ74HmVnB3V2jG2r7FBqICk+hLZutMWNJcfEk8rlQtJi82DjnDhtd3T05i0JXd2EftM+8nHvWWFw8qee37skrnkx+F4oWixcb59xh48WOvXnFk3qpI/eHf1w8qSMac490xMWTGFWf+2M/Ll4sXmycc4eNtMYrGupyf5TGxZOaeETu617i4kk8tXFnXvFi8WLjnDts7O03Y2yweFLb9+aeNhwXT+pt047KK57EhDG5l7qJixeLFxvn3GEk+TbL+UhrFMRi0oqLJ/Fnr889sy8uXixebJxzFa2Y65gdOzb39TRx8aTqa3N/lMbFk3omZiHPuHgSG3fmnnkXFy8Wv87GOVexir2O2Z4DPdTXiq6eg02D+loVfD3Mf23J/eEfF0+qPuYaoLh4Eq/u68orXizesnHOVaQ01jEb3VDbp9AAdPVYwdfD7Orsziue1NObcheruHgSR47MvV5bXLxYvNg45ypSGuuY7TnQQ2O/Ncwai9Cy2R1TVOLiSW3bsz+veBJp7JGThBcb51xFam4aSVdv31liXb29Ba1j1tw0EvVbxFI1KnhttM07c7e24uJJNTbkHumIiyex50AP/eottSp8aZ3BeLEpkmJvxuTc4W78mEbOa2nuEzuvpbmgPVfGj2mk5bimPrG3HtdU8D4u/Rf3HCye1PSJo/OKJzG6oZZ+PYn0WOFL6wzGi00RLF65gdnXPsSFNy9n9rUPsWTlhnKn5FzV69jdycLW9j6xha3tBX2ha9u8i9+0dfSJ/bqto+Dtm+tqcn+UxsWTqlXu58fFk8gsGpqtGIuGDsaLTYHS2ozJucNdGmM2v2nLvf17XDyp0THLx8TFk3olZiuBuHgSaS0aOhgvNgVKazMm5w53aYzZPL8lZmHLmHhSoxpzf1DHxZN6ZVdMsYmJJ7HnQA8j+q2DNqK+Zvi1bCRNlvSwpD9IWivpMyF+lKSlkp4LfzeFuCTdIKlN0mpJJ2eda0E4/jlJC7Lip0haE55zg4qxYUWMNP5BOOei8ZXrzp5JY50YVV9LY5247uyZBY2vPLc1d3dZXDypF1/JXazi4klt3xezDE5MPInmppH09Nuqs6fXhuW20N3A583sjcCpwKWSTiDa7vlXZjYD+FW4D3AGMCP8uRi4EaLiBFwNvJ1oh8+rMwUqHHNx1vPmpPVmMv8gRtTXcERjHSPqawr+B+FcxuE28aT/+zWipVl6zApaoiVjSlPuBTfj4kntP5B7bbW4eFJx7aJCO7xy7emTtpKvIGBmm4BN4fYuSX8AJgHzgNPDYbcTbRd9eYjfYdFPY5mkcZKOCccuNbNtAJKWAnMkPQIcaWaPhfgdwFnAA2m9p7mzJjF7+gTat++juWmkFxpXFMW+er7S9X+///DBE7h6yVNhfCH6MPzcwpXMnj5hyP/G6utyf0zHxZNqqBddBw79wG6oL6xTZVRjPXu7D72yf1Tj0HfqbN++j5H1dX0uOB1ZX0f79n2pfnaVdcxG0lTgLcBy4OhQiDIF6XXhsEnA+qyntYfYQPH2HPFcr3+xpFZJrVu3FjZAOH5MIydNHueFxhVFNU48KaQVluv9XvNaoTmouxfWbnx1yDkeF7OVQFw8qcaYNdDi4kl1dueeOh0XT6JcXf9lKzaSxgD3Av/TzAb67cn11cCGED80aHaTmbWYWcvEiRMHS9m5ARWzy6vaJp4UOv0/1/utiRlqLWQNrzQ2IwMY3Zh7qZe4eFLd3bm7t+LiSYwf08h5pxT3+qUkylJsJNUTFZqfmNnPQnhz6B4j/L0lxNuByVlPbwY2DhJvzhF3LjXFvtaqmiaeFKMVluv99h/ELoblL3TkFU/qqDG5u7Xi4kkdOTL38+PiSXTs7uSuJ9b3id31+PrUW83lmI0m4BbgD2b27ayHlgCZGWULgMVZ8flhVtqpwM7QzfYg8H5JTWFiwPuBB8NjuySdGl5rfta5nCu6NLq8Ds7EqmFUQy2NdZU78aQYrbBcE20uPO24YqfK8he25RVPamvMVOS4eGI1MWM+cfEE1m7cmXMx0rUp79RZji0GZgMfA9ZIWhliXwS+ASyUdBGwDjg3PHY/cCbQBuwFPgFgZtskfQV4Ihz35cxkAeAS4DZgJNHEgNQmB6SpY3enTzqoApkP2/0c/Gae+bAt5P+bZf5rotDNvdJUrFZY/4k2j/1XB7f97qVipsreA7nHOuLiSW2JWQMtLp7U7v25uwzj4km8ui/3e42LF0s5ZqP9hvgN7N6T43gDLo05163ArTnircCbCkiz7A63mUjVLI0ur0xrqbPbgOhiu8vuXV3QTKz+5y/WF5lMq+Syfr+vQznv+DGNrz3vyJG5P57i4kn86dFHsvzF7TnjhYib4FzYxGcQub9oqKA9QNPZrXQwvnlaBcrulsl8Wy7mB40rrmJ+2Gak1VqCdL7IpDH9P41dNdOaNVYryDXE1H915XyNaqhj94FDL+AcVcCqz2mMAyXhxaYCpflB49JR7A/btCYIpPlFJrtVUgytL+YeR2l9cRvTjz5iSOdcsS73RIC4eFJdMY2CuHhStTFjM3HxJEbVxyytExMvFl8brQJV00ykatS2eReLWtcXvNJvf8W81iqtlSniBu0rcUp160uHdncNFE8ibtZ0yjsiD1ldTFGJiyfxVMx1SnHxYvGWTQVKo1vGRa66bw13LFv32v35p03hy/PeXMaM4s2dNYkTjjmSlet3MGvyuCF/m882uqGW/V19v8js7+otyoq/xZ7QUh/TBxUXT6KuBnKtINN/yf1KsaczdxWMiycxIubNxsWLxYtNhfIlcIqvbfOuPoUG4I7H1jH/1KlF+SAvtjTGVjbuzL2d8Mad+wv6GSxeuYHLFq2mtkb09BrfPKfwXPd35V6FOC6eRE/MiH1cvNy27cv9XuPiSRwbsw5cXLxYKrSeO/AlcIpt5fodecXLKb3laoo/E6ljdydfuGcVnd297D3QQ2d3L5+/Z1XBubZt2Z1XPIm4j+h0F9evLKPqc3/sx8WLxYuNO2zMmjwur3i+ir1cjfWb3mS9VvDYyonHjj1kcLm2Rpx47NghnzOtiwSzF4pMEh+OJo7OPUMsLp7Eix1784oXi3ejVbBquqizGnKdfvQR/Pn08X22Bf6L6eOL0oVW7G6k0Q21dPb7AO/ssaKMrfT2K2L97+cvbgylsHm/e/fHXIAZEy+GSvs9ro+Zkh0XTyLtL11xvNhUqOjDaxW1qqHHevnmOScV7aLOts27ijroXC0XoHbs7mTZ832nuD72fAcduzsL+mDJdCNlf7v//D2rCppO/MzLuWfKPfPyroL+n63d+OohHWYW4u88fmiL0R47dkRe8aS27ck9CB4XL1Ql/h7vjNkkLS6eRNPohte+FGXU1oim0YUtGjoY70arQB27O/n8wpV0dht7u3ro7DY+t3BlUbpnrrpvDe+9/lG+sGg1773+Ua5avKbgXNNcCr+Y05TXbny16EvWR+ctfjfSKzE/v7h4csUfsxnoephCxLVf0mrXVOKWDt0xrc64eBLt2/cdck3NqPra1Ke/e7GpQGl9KMbNxirkgzytsQUofmFMb5mO4ncj/fn0CXnFk0rjgr40rocph0rc0iHuA7qQD+7Dbj8bN5B0PhTTmI2V1thCGoUxjeVPouen042Uhl+s2ZRXPInNMdOp4+KVqhIvpG6ImSEWF0+iXCuKe7GpQCceO/aQC9fqawubMQTpDAwOdN1GIX7Tlnvn1Lh4EnsO9ByyVlWtonghBhpfGaoH176cVzypVTFfLOLiSTy7OXeLOy5eqdJYsaFQTaNyv35cPKmDK4ofvJc2LzYVaPyYRr517kl9vnl869yTCv7FbxrdcEjHjkJ8qJ59OeaDJiae1H9t2ZNXPInRDbX0a4TRYxTcCnupI3dOcfEklj3/Sl7xpF56Jfc1KnHxJLbszj1gHxevVHNnTeK3l7+bH3/y7fz28neXfXIApHNha/aK4pkx4VKMT/lstAqVxlIl7dv30VCrPt1eDbUqaIHPZ2KKSlw8qbatMRf0xcSTuKd1fWz8yg+eMOTz/nFT7vcaF0/imU0xraWYeFIdMVeex8UPN8VeTLRQ/bv2BosnUa6Ffodty0bSHEnPSmqTdEW588nX4pUb+ND3fsM//t+n+dD3flPwNsOQzvjKjr25p2DGxZN686TcXYZx8STuaV2XVzyp/2zL3dqIiydxoCf3h39c3A1TKQzf+gSBIpJUC3wfOAM4AbhA0tC/upZYWtOJH352S17xJNLo7krLqzHf3uPiSe2PuaI9Lp7Ezv25v7nGxd3wNOWo0XnFk0hrRfHBDNdutLcBbWb2PICku4F5wNNlzSqhtJq5K2Kmoq54aTv/bYjnrFHu6b1x8aTSGHRubBDdBw79StjYUFiu9Q1i//5Dz1tf4Hmde8f08TzZfuj1Wu+YPr6g85Zjod9h2bIBJgHZHfTtIdaHpIsltUpq3bp16LOcii2tZu4pxzXlFU/ib2dPzSue1IdjBmfj4knMbM79PuPiic97bMx5Y+JJjGnI/U8zLu7KK43rYQA+/JbmvOL5KPVCv8P1NzfXV8pDvnqa2U1m1mJmLRMnDm2pjjSk1cz965Nz/4LGxZO48B3TGDui75jP2BG1XPiOaUM+J8BZJ0/mmCP7zpI75sgGzjp58pDP+fn3HZ9XPKnPxjw/Lp7EFXP+LK94UqdPPyqveBJTxuX+vYyLJzUqpt8lLp7EOW85Jq94Up9+1xvyiic1/egjmH/alD6x+adNqcgtMQYjs9LMsS4lSacB15jZB8L9KwHM7J/intPS0mKtra0lyjCZNBYFXLJyA5/76UpMIINv/82sokzx/PHvXmDx6k3Mm3lMwYUm232/X8/P17zMh978+oIKTcbHbl7Gr/stxPmjT55akec96Zr/YOf+g+NJY0fUsuqaOQWdE2DqFb84JPbiNz5YcedM67xp5Trjyl/02Qa6XvDcPxV+Xij+eobFJGmFmbUMetwwLTZ1wB+B9wAbgCeAj5jZ2rjnVGKxSUulrWxbaq0vdPDoc6/wzhkTaJlWWN932udNq4h//ObH+N0L23jHtKO47ZOnFeWc7/zGL1m3o5Mp4xp59Ir3FuWcACf871+wtztq0Tz91eJ8eH/hp79n6R+28L43vo5//puTi3JOgOsf/AOLV7/MvJmv57MfeGPRzlvJDutiAyDpTOA7QC1wq5l9baDjD6di45xzxZK02AzX2WiY2f3A/eXOwznn3PCdIOCcc66CeLFxzjmXOi82zjnnUufFxjnnXOqG7Wy0fEnaCrxU4GkmAIWtAV86nms6PNd0eK7pKEaux5nZoFfFe7EpIkmtSaYAVgLPNR2eazo813SUMlfvRnPOOZc6LzbOOedS58WmuG4qdwJ58FzT4bmmw3NNR8ly9TEb55xzqfOWjXPOudR5sXHOOZc6LzZ5kjRH0rOS2iRdkePxRkk/DY8vlzS19Fm+lstguX5O0tOSVkv6laTjypFnyGXAXLOOO0eSSSrr1NIk+Uo6L/x810q6s9Q5ZuUx2O/BFEkPS3oy/C6cWaY8b5W0RdJTMY9L0g3hfayWVLy9AfKUINePhhxXS/qdpJNKnWNWLgPmmnXcWyX1SDonlUTMzP8k/EO0XcF/AW8AGoBVwAn9jvk74F/C7fOBn1Zwru8CRoXbl1RyruG4I4BHgWVAS4X/HswAngSawv3XVXCuNwGXhNsnAC+WKdd3AicDT8U8fibwANFOvKcCy8v4OzBYru/I+n9/RiXnmvV78hDRSvnnpJGHt2zy8zagzcyeN7MDwN3AvH7HzANuD7cXAe+RlGub6rQNmquZPWxme8PdZUDhG5sPTZKfK8BXgOuA/aVMLock+f434Ptmth3AzLaUOMeMJLkacGS4PRbYWML8DiZh9iiwbYBD5gF3WGQZME5SYfs5D9FguZrZ7zL/7ynvv60kP1eATwP3Aqn9nnqxyc8kYH3W/fYQy3mMmXUDO4HibQeZXJJcs11E9K2xHAbNVdJbgMlm9vNSJhYjyc/2eOB4Sb+VtExS4fs5D02SXK8BLpTUTvTN9tOlSS1v+f5OV4py/tsalKRJwIeBf0nzdYbt5mkpydVC6T93PMkxpZA4D0kXAi3AX6aaUbwBc5VUA1wPfLxUCQ0iyc+2jqgr7XSib7W/lvQmM9uRcm79Jcn1AuA2M/uWpNOAH4Vce9NPLy+V8m8rMUnvIio2f17uXAbwHeByM+tJsxPGi01+2oHJWfebObTLIXNMu6Q6om6JwZqwaUiSK5LeC3wJ+Esz6yxRbv0NlusRwJuAR8I/htcDSyTNNbNy7OWd9PdgmZl1AS9Iepao+DxRmhT75DFYrhcBcwDM7DFJI4gWaCxX11+cRL/TlULSTOBm4Awz6yh3PgNoAe4O/7YmAGdK6jaz+4r5It6Nlp8ngBmSpklqIJoAsKTfMUuABeH2OcBDFkbgSmzQXEPX1L8Cc8s4pgCD5GpmO81sgplNNbOpRH3g5So0kOz34D6iCRhImkDUrfZ8SbOMJMl1HfAeAElvBEYAW0uaZTJLgPlhVtqpwE4z21TupHKRNAX4GfAxM/tjufMZiJlNy/q3tQj4u2IXGvCWTV7MrFvSp4AHiWZv3GpmayV9GWg1syXALUTdEG1ELZrzKzjXbwJjgHvCt5p1Zja3QnOtGAnzfRB4v6SngR7gf5Xj223CXD8P/JukzxJ1S328HF+QJN1F1O04IYwfXQ3Uh/fxL0TjSWcCbcBe4BOlzjEjQa5XEY3V/iD82+q2Mq0EnSDX0uRRni/dzjnnDifejeaccy51Xmycc86lzouNc8651Hmxcc45lzovNs4551Lnxca5Egir6a6U9JSk/ytp3CDHj5P0d1n3j5W0KP1MnUuHT312rgQk7TazMeH27cAfzexrAxw/Ffi5mb2pNBk6ly5v2ThXeo8RFpCUNEbRXkK/l7RGUmZF5m8AfxJaQ9+UNDWzH4mkj0v6maT/kPScpOsyJ5Z0kaQ/SnpE0r9J+l7J351zOfgKAs6VkKRaoqVhbgmh/cCHzezVsKzNMklLgCuAN5nZrPC8qf1ONQt4C9AJPCvpu0QrFfwD0d4lu4j2J1mV6htyLiEvNs6VxkhJK4GpwApgaYgL+LqkdwK9RC2eoxOc71dmthMgLIlzHNEiiv9pZttC/B6iNdmcKzvvRnOuNPaFVspxRDtmXhriHwUmAqeExzcTLYQ5mOwVunuIvjiWY5M+5xLxYuNcCYXWyN8DX5BUT7QFxRYz6wp7nxwXDt1FtLVCPh4H/lJSU9je4uxi5e1cobzYOFdiZvYk0VjK+cBPgBZJrUStnGfCMR3Ab8NU6W8mPO8G4OvAcuCXwNNEO8U6V3Y+9dm5YUTSGDPbHVo2/060pcC/lzsv57xl49zwck2YiPAU8ALRJm7OlZ23bJxzzqXOWzbOOedS58XGOedc6rzYOOecS50XG+ecc6nzYuOccy51/x8AEGt82qsZMgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.plot.scatter(x = 'rating', y = 'favorite_count')\n", "plot.set(xlabel = 'Rating', ylabel = 'Favorites', title = 'Favorites vs Rating');" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.plot.scatter(x = 'rating', y = 'retweet_count')\n", "plot.set(xlabel = 'Rating', ylabel = 'Retweets', title = 'Retweets vs Rating');" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.plot.scatter(x = 'retweet_count', y = 'favorite_count')\n", "plot.set(xlabel = 'Retweets', ylabel = 'Favorites', title = 'Favoriting & Retweeting');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The more retweets, the more likes. Did you expect that? Or should it be the other way around?" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:57: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead\n", " return getattr(obj, method)(*args, **kwds)\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.boxplot(column = 'rating', by = 'dog_stages')\n", "plot.set(xlabel = 'Dog Stages', ylabel = 'Rating', title = 'Rating By Dog Stages');\n", "\n", "# This cell doesn't produce any warnings on my local machine. See the act_report.html.\n", "# It seems like Project Workspace needs some upgrade )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you like puppies, I may have some bad news for you: their cuteness seems to win them on average lower rating, than the other stages demonstrate. The following two boxplots on ratings and favorites show the same tendency." ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:57: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead\n", " return getattr(obj, method)(*args, **kwds)\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.boxplot(column = 'favorite_count', by = 'dog_stages')\n", "plot.set(xlabel = 'Dog Stages', ylabel = 'Favorites', title = 'Favorites By Dog Stages');\n", "\n", "# This cell doesn't produce any warnings on my local machine. See the act_report.html.\n", "# It seems like Project Workspace needs some upgrade )" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:57: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead\n", " return getattr(obj, method)(*args, **kwds)\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot = df.boxplot(column = 'retweet_count', by = 'dog_stages')\n", "plot.set(xlabel = 'Dog Stages', ylabel = 'Retweets', title = 'Retweets By Dog Stages');\n", "\n", "# This cell doesn't produce any warnings on my local machine. See the act_report.html.\n", "# It seems like Project Workspace needs some upgrade )" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pupper 203\n", "doggo 62 \n", "puppo 24 \n", "pupper,doggo 4 \n", "Name: dog_stages, dtype: int64" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dog_stages.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's a pity that there is not enough data to judge if a pair of a dog with a pup really doing on average better than others. But we can use our ~~subjective~~ expert opinion here. Aren't they great?" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "parents = list(df[df.dog_stages == 'pupper,doggo'].jpg_url)\n", "\n", "\n", "from skimage import io\n", "\n", "imgs = []\n", "for pair in parents:\n", " imgs.append(io.imread(pair, 0))\n", "\n", "plt.figure(figsize=(20,5))\n", "columns = 4\n", "for i, img in enumerate(imgs):\n", " plt.subplot(len(imgs) / columns + 1, columns, i + 1)\n", " plt.imshow(img)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "#### Reference\n", "\n", "1. Heavily exploited StackOverflow post on [multiple images in Jupyter notebooks]( https://stackoverflow.com/questions/19471814/display-multiple-images-in-one-ipython-notebook-cell)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }