Wednesday, April 22, 2020, day 41 of the lockdown
Working at home 4 days a week, teaching at home 4.5 days a week I received a challenge from my linkdin contact Ijeoma Irene Agbugba and I have accepted it. It is the following post
This is the report I made out of it
How did I get there.
First, I did a check in Pyton to get an idea about the data, download the file to read my analyse.
Recap on what I found in Python.
- The data has 3,229 rows (= messages send) and 13 columns
What the columns headers mean, looking at the data in connection to sending a tweet (in order of the file).
|‘created_at’||When has the tweet been posted|
|’text’||The text of the tweet|
|‘in_reply_to_status_id_str’||An ID number of another tweet if it is a reply|
|‘source’||What source has been used to post the tweet|
|‘in_reply_to_screen_name’||The screen name of the poster of the replied tweet|
|‘contributors’||Column has al blanks for this dataset|
|‘quoted_status_id’||An ID number of another tweet if it has been quoted|
|‘quote_count’||Column has al blanks for this dataset|
|‘reply_count’,||How many times has there been a reply to this tweet|
|‘retweet_count’||How many times has this tweet been retweeted|
|‘favorite_count’||How many times has the tweet been liked|
|‘retweeted’||Column has al blanks for this dataset|
|‘followers_count’||Column has al blanks for this dataset|
- All tweets are unique in text.
- The messages have been sent at 3,085 different timestamps
- 7 different sources have been used
- 3,097 times Twitter for iPhone
- 62 times Twitter Web App
- 61 times Twitter Web Client
- 4 times Medium
- 3 times Persiscope
- 1 time Twitter Media Studio
- 1 time Facebook
- There are 65 different screen_names that have been replied to.
- 943 times there was a reply to atiku, the next in line was bjay75 who got 5 times a reply
- 310 id’s have been quoted.
- For 2,913 messages the quoted_status_id was not available.
Time to import the data set into Power BI
Let’s create some visuals and answer the set of Questions
- In total there are 3,229 tweets send.
- The tweets are send at 3,085 different timestamps. At 1,039 different dates.
- There are 7 source types used.
- Top 4 screen_names that has been replied to.
- 310 ID numbers have been quoted.
- I made a the top 5 most active days of tweeting.
- There are 545 tweets retweted, in the original data set, there is no column where to find whose tweet has been retweeted.
- Most tweets are sent in between 7 AM and 8 PM, and the top 5 is 7:00-8:59; 11:00-11:59, 17:00-17:59 and 19:00-19:59
- The source twitter for iPhone is mentioned, so that is a device. With the other source I do not know the device.
- The first tweet is posted at 21-11-2014 and the last one at 19-12-2019.
- On average over this full-time frame there are 3.11 tweets a day.
- Most tweets are posted at Sunday, February 3, 2019 and that was 48 tweets.
- Looking at just the months, then December is the month where the most tweets are posted (390), when splitting in down to the month and years February 2019 has the most tweets with 190.
- Using the option Word Cloud in PowerBI I looked at the used words. I removed stop words and “https” and “rt”. Then the most used words are Nigeria (420) and atiku (462)
- I made a Word Cloud for the top 10 hashtags. First, I isolated the hashtags into a new column.
After I got the visuals, I opened PowerPoint to create a background for the report and to organise the dashboard. To get the end result I got. Because it is about tweets I have inserted the Twitter logo and the blue colour