Challenge on LinkedIn

Wednesday, April 22, 2020, day 41 of the lockdown

Working at home 4 days a week, teaching at home 4.5 days a week I received a challenge from my linkdin contact Ijeoma Irene Agbugba and I have accepted it. It is the following post

The challenge

The result

This is the report I made out of it

The Road

How did I get there.

Python

First, I did a check in Pyton to get an idea about the data, download the file to read my analyse.

Recap on what I found in Python.

  • The data has 3,229 rows (= messages send) and 13 columns

What the columns headers mean, looking at the data in connection to sending a tweet (in order of the file).

‘created_at’When has the tweet been posted
‘text’The text of the tweet
‘in_reply_to_status_id_str’An ID number of another tweet if it is a reply
‘source’What source has been used to post the tweet
‘in_reply_to_screen_name’The screen name of the poster of the replied tweet
‘contributors’Column has al blanks for this dataset
‘quoted_status_id’An ID number of another tweet if it has been quoted
‘quote_count’Column has al blanks for this dataset
‘reply_count’,How many times has there been a reply to this tweet
‘retweet_count’How many times has this tweet been retweeted
‘favorite_count’How many times has the tweet been liked
‘retweeted’Column has al blanks for this dataset
‘followers_count’Column has al blanks for this dataset
  • All tweets are unique in text.
  • The messages have been sent at 3,085 different timestamps
  • 7 different sources have been used
    • 3,097 times Twitter for iPhone
    • 62 times Twitter Web App
    • 61 times Twitter Web Client
    • 4 times Medium
    • 3 times Persiscope
    • 1 time Twitter Media Studio
    • 1 time Facebook
  • There are 65 different screen_names that have been replied to.
    • 943 times there was a reply to atiku, the next in line was bjay75 who got 5 times a reply
  • 310 id’s have been quoted.
    • For 2,913 messages the quoted_status_id was not available.

Power BI

Time to import the data set into Power BI

Let’s create some visuals and answer the set of Questions

  • In total there are 3,229 tweets send.
  • The tweets are send at 3,085 different timestamps. At 1,039 different dates.
  • There are 7 source types used.
  • Top 4 screen_names that has been replied to.
  • 310 ID numbers have been quoted.
  • I made a the top 5 most active days of tweeting.
  • There are 545 tweets retweted, in the original data set, there is no column where to find whose tweet has been retweeted.
  • Most tweets are sent in between 7 AM and 8 PM, and the top 5 is 7:00-8:59; 11:00-11:59, 17:00-17:59 and 19:00-19:59
  • The source twitter for iPhone is mentioned, so that is a device. With the other source I do not know the device.
  • The first tweet is posted at 21-11-2014 and the last one at 19-12-2019.
  • On average over this full-time frame there are 3.11 tweets a day.
  • Most tweets are posted at Sunday, February 3, 2019 and that was 48 tweets.
  • Looking at just the months, then December is the month where the most tweets are posted (390), when splitting in down to the month and years February 2019 has the most tweets with 190.
  • Using the option Word Cloud in PowerBI I looked at the used words. I removed stop words and “https” and “rt”. Then the most used words are Nigeria (420) and atiku (462)
  • I made a Word Cloud for the top 10 hashtags. First, I isolated the hashtags into a new column.

PowerPoint

After I got the visuals, I opened PowerPoint to create a background for the report and to organise the dashboard. To get the end result I got. Because it is about tweets I have inserted the Twitter logo and the blue colour

#SWDChallenge; 2019 October

This month I’m participating in the #SWDchallenge. We are given a small table. 5 rows and 5 columns.

Question 1: Review the data in the figure. What observations can you make? Do you have to make any assumptions when interpreting this data? What Questions do you have about the data?

Answer 1: First thing I did was checking if the columns with the percentages where adding up to 100. Both columns did not, so I made new columns and removed the old ones. I assume Tier A+ is an improved version of Tier A. I want to know the price per account.

Question 2: Consider the layout of the table in the figure. Let’s assume you’ve been told this information must be communicated in a table. Are there any changes you would make to the way the data is presented or the overall manner in which the table is designed?

Answer 2: I only moved Tier A+ to the top and kept the rest of the row design the same. I have added the column Price per account ($K) before showing the Revenue ($M). I made all columns the same width and wrapped the text of the top bar.

Question 3: Let’s assume the main comparison you want to make is between how accounts are distributed across the tiers compared to how revenue is distributed and that you have the freedom to make bigger changes (it’s not required to be a table). How would you visualize this data? Create a graph in the tool of your choice.

Answer 3: I will use PowerBI as the tool of my choice and start with loading the original data. I added 3 columns in PowerBI using DAX formulas.

% Revenue =
DIVIDE ( 'EXERCISE 2 1'[Revenue ($M)]; SUM ( 'EXERCISE 2 1'[Revenue ($M)] ) )

% Accounts =
DIVIDE ( 'EXERCISE 2 1'[# of Accounts]; SUM ( 'EXERCISE 2 1'[# of Accounts] ) )

Price per account =
ROUND (
    DIVIDE ( 'EXERCISE 2 1'[Revenue ($M)]; 'EXERCISE 2 1'[# of Accounts] ) * 1000;
    2
)

I have chosen for the default colour theme and then in two colours of blue.

The chart is called a line and column chart. In the column I have put the percentage of accounts and the line is showing the percentage of revenue. With a sort in the highest # of accounts first. In the tooltip you can read how many accounts are in that % of accounts.

This is my visualisation:

#SWDChallenge; 2019 May; First time; part 6; Feedback from Cole Knaflic @storywithdata

I got homework for May 20, when I’m done with the Techionista training

#SWDChallenge; 2019 May; First time; part 4; Progress

  • First step: Make a table of the worksheet. Now it has all additional rows that do not contain the right information
  • Load the excel into PowerBI and do some transformations
    • Fill blanks in the “week nr” column
    • Replace the 1 null value in column “onderdeel”
    • In the column “begintijd” and “eindtijd” needs only to be time and no date
    • Recalculate the column “totale tijd”
    • The column “in uren” rounded to 2 decimals
  • Oh the column “totale tijd” is not giving the right values, I want to see the amount of minutes in there, back to Edit Queries, to find out why. Ah Data Type was Any, changed that to Duration en minutes
  • Yes my Data looks good
  • Add the excel file with the advised hours from Microsoft and Techionista
    • Do some transformations
  • Check the relation between the two tables
  • First visualization; a bar chart showing my hours, compared with Microsoft and Techionista

Conclusion hours by subject

The red bars show the hours that where planned by Techionista, black is what Microsoft advised, for 3 topics Techionista scheduled lesser hours than Microsoft advised. Green is what I actually have spend. For 3 subjects and the capstone I have spend more hours than Microsoft had advised.

The second visualization I want to make is a line chart, showing the time spend per day from the beginning to the end.

  • First I added a new table with dates, so I could use the date as the x-axis
  • Than I made a chart, where I visualised the hours per day and than the topic as legend. Only to realise that a line chart is not clear. So I changed it into a bar chart, which gave me the idea I wanted to see.
  • I also made a table to show the description by subject

Conclusion hours by day and subject

The further I got into the study, the more hours I have spend during the day.

#SWDChallenge; 2019 May; First time; part 2

The new challenge is here: Challenge May 2019
This month’s #SWDchallenge comes from guest author Mike Cisneros.

The challenge

Go out and collect a dataset of your own, analyze it and create a graph visualizing your findings. (Remember: sometimes the smallest, most specific stories can tell the most universal truths.)

My reaction

Oh I wanted to join because I do not have a dataset of myself, okay will join next month


#SWDChallenge; This looks interesting and challenging

Today I was reading a post on LinkedIn and thought, this looks interesting and challenging. It was a post about storytelling with data and they challenge people to do a practice making a visualisation in the first week of the month and to share it. I’m thinking of taking up this challenge this month, see how far I can get, with the knowledge I have right now and my available time.