Challenge on LinkedIn

Wednesday, April 22, 2020, day 41 of the lockdown

Working at home 4 days a week, teaching at home 4.5 days a week I received a challenge from my linkdin contact Ijeoma Irene Agbugba and I have accepted it. It is the following post

The challenge

The result

This is the report I made out of it

The Road

How did I get there.


First, I did a check in Pyton to get an idea about the data, download the file to read my analyse.

Recap on what I found in Python.

  • The data has 3,229 rows (= messages send) and 13 columns

What the columns headers mean, looking at the data in connection to sending a tweet (in order of the file).

‘created_at’When has the tweet been posted
’text’The text of the tweet
‘in_reply_to_status_id_str’An ID number of another tweet if it is a reply
‘source’What source has been used to post the tweet
‘in_reply_to_screen_name’The screen name of the poster of the replied tweet
‘contributors’Column has al blanks for this dataset
‘quoted_status_id’An ID number of another tweet if it has been quoted
‘quote_count’Column has al blanks for this dataset
‘reply_count’,How many times has there been a reply to this tweet
‘retweet_count’How many times has this tweet been retweeted
‘favorite_count’How many times has the tweet been liked
‘retweeted’Column has al blanks for this dataset
‘followers_count’Column has al blanks for this dataset
  • All tweets are unique in text.
  • The messages have been sent at 3,085 different timestamps
  • 7 different sources have been used
    • 3,097 times Twitter for iPhone
    • 62 times Twitter Web App
    • 61 times Twitter Web Client
    • 4 times Medium
    • 3 times Persiscope
    • 1 time Twitter Media Studio
    • 1 time Facebook
  • There are 65 different screen_names that have been replied to.
    • 943 times there was a reply to atiku, the next in line was bjay75 who got 5 times a reply
  • 310 id’s have been quoted.
    • For 2,913 messages the quoted_status_id was not available.

Power BI

Time to import the data set into Power BI

Let’s create some visuals and answer the set of Questions

  • In total there are 3,229 tweets send.
  • The tweets are send at 3,085 different timestamps. At 1,039 different dates.
  • There are 7 source types used.
  • Top 4 screen_names that has been replied to.
  • 310 ID numbers have been quoted.
  • I made a the top 5 most active days of tweeting.
  • There are 545 tweets retweted, in the original data set, there is no column where to find whose tweet has been retweeted.
  • Most tweets are sent in between 7 AM and 8 PM, and the top 5 is 7:00-8:59; 11:00-11:59, 17:00-17:59 and 19:00-19:59
  • The source twitter for iPhone is mentioned, so that is a device. With the other source I do not know the device.
  • The first tweet is posted at 21-11-2014 and the last one at 19-12-2019.
  • On average over this full-time frame there are 3.11 tweets a day.
  • Most tweets are posted at Sunday, February 3, 2019 and that was 48 tweets.
  • Looking at just the months, then December is the month where the most tweets are posted (390), when splitting in down to the month and years February 2019 has the most tweets with 190.
  • Using the option Word Cloud in PowerBI I looked at the used words. I removed stop words and “https” and “rt”. Then the most used words are Nigeria (420) and atiku (462)
  • I made a Word Cloud for the top 10 hashtags. First, I isolated the hashtags into a new column.


After I got the visuals, I opened PowerPoint to create a background for the report and to organise the dashboard. To get the end result I got. Because it is about tweets I have inserted the Twitter logo and the blue colour

Creating a PowerBI visual from a Whatsapp group

Saturday February 28 2020, I bumped into a post at LinkedIn from Obinna Iheanachor . It was a second post about how to visualize a whatsapp chat into PowerBI. It looked interesting and I wanted to try it out.

Step 1) extracting whatsapp chat data

That was not that difficult, I took a group with data since April 2018 and with 34 members. All members are family of each other.

Step 2) data cleaning and preparation in Python notebook

I need to have the following in Python: pandas, matplotlib, regex, Dateparser and numpy. Mmm I don’t seem to have Dateparser. Let’s see if I’m able to ad this to my python.
Yes while using the Anaconda prompt I managed to install the missing package.
This is a step that took time, because my data was different than the data in the blogpost. I ran into different errors that I needed to solve. It was nice to look into the reason of the error and changing the code in such way, that it fitted my data. I did more than enough reading into the Python documentary.

But I did it, on Tuesday I had a csv-file ready to import into PowerBI

Step 3) data modeling in PowerBI

I started with following the steps and saw my visualizations and thought, this is not really on how I want it.
I went to PowerPoint to create a nice background with a layout and a logo. The charts fitted into the background and I liked the result. This time I cannot share the presentation online, because it is private information about 34 members. But I would like to share that we never forget a birthday, because the two most words are happy birthday.

Step 4) using other data

My son saw the visualizations and he wanted to know how it would look for a WhatsApp group from him. So, he extracted the data and gave the txt file to me.
There was a difference in his and my text file. The date and time where formatted different and I was using that format in the code. That meant I needed to run line by line to check the result of the code and yes, some code needed to be adjusted.
Also, in PowerBI in needed to adjust some of the code that I used by importing the data. I was funny to see, that this group of kids is talking the most about, who can play?

2019-10-25 My first Hackathon

Wednesday, we received the last information about the hackathon. I made a Trello Board, connected with Planyway so I could view the agenda in that way and easily tick off when events where past.

Friday morning 7 AM was the time that I left to travel to Wageningen. The trip went smooth and at 9:00 I walked into the building, we started with coffee and tea.
At 9:30 we were officially welcomed and the four different challenges where presented
At 10.30 and at 11:00 we had the chance to talk to the challengers to find out more about the challenge. At home I had looked and thought that the Hendrix challenge was the coolest. During the presentations I also found the Lely & Rovecom Challenge interesting. At the end I did choose for Hendrix. 5 other people also choose this option and together we made a team.

Here a photo of the team and the challengers in blue.

When the teams were formed it was team for a lunch. After a good lunch we received the data and started exploring the data. Some with Python, some with R, I used Python and I used Power BI to get visual insides.

There was a lot of information in the data

The goal was: predict animal performance in warmer barns. So that is what we were looking for. I cannot tell much about our findings, because we explored the data, but we did not become owners of the data nor of the results.

At 16:00 we went to have a drink and a talk. We got and inspiration talk by Arthur Mol and information about the StartHub form WUR. 18:00 time for dinner and at 19:00 back to the data.

We were working closely to getter in different tools to find solutions and answers to the hypothesis. Just after midnight it was enough for me. I could barely keep my eyes open, lets even think. So, I unrolled my yoga math in another room and crawled into my sleeping bag.
Two other team members had left a little bit earlier to sleep home. 3 continued. 1 fell asleep in the chair while working and 2 fell on their sleeping mattress at 6:30.

At 8:00 it was breakfast time, this was served in the room we were, so at 7:50 the catering switched on the lights and we all were woken up.

After Breakfast time to pitch about the status of the project and back to sprint to the end of the project.

At 12:00 time to hand in all the work and time for lunch.

12:45 pitching time. 6 minutes per group. Whauw there were some stunning results. After the pitched it became thrilling; we had to wait for the jury to come with a result.

There were 3 prices for some terrific results. Unfortunally not for our result. On the other hand, the people form Hendrix were happy we had analysed the data and told them insides on it. They had not checked the data before this challenge.

For me I got what I had wanted to get. The experience of being present at a Hackathon. Seeing how other people deal with this. Working in a team. Meeting new people. Expanding my network with data lovers in the agri sector.

LinkedIn Analytics in Power BI

Thanks to Warren Dean for his blog, about how to make a Dashboard for LinkedIn Analytics

I found this blog at the beginning of September and I do want to analyse my LinkedIn information.
So, at September 6 I requested LinkedIn to send me my information. While waiting the 24 hours it takes LinkedIn to make my zip-file I started on exploring the Power BI file.

Warren used elements to create the top of the Dashboard. I used the lesson from a webinar (start at 39:40) and created a background in PowerPoint using Warren’s layout.
My last name is longer than Warren’s, so that gave a challenge to let it fit nicely. In the title I also have included since when I’m a LinkedIn member.

Fitting the slicer is the next challenge, that I defeated, by reshaping the slicer, I removed the title of the slicer. I also gave the slicer a fixed end date, because the data received from LinkedIn is static. When I ask an update from LinkedIn, I need to update this end date.

Waren used the default theme colours. I do not like to use the default so I used the website HTML color codes to find the RGB code for the LinkedIn logo and used that colour.

Top 20 Count of Company

The chart “top 20 count of company” shows in my dashboard more than 20 companies, so I checked the settings of the chart. I learned that it is because if there are more companies with the same count, they are also shown, so I change the title of the chart to “Where do the most of my connections work and what is there position” I made it a top 10.

Count of Share by Hour & Count of Share by Weekday

Here I changed the title and the colours.

LinkedIn Activity => Reactions per day

My linkedIn activity is not as high as Warren, so I decided to skip this cart and to change it into reactions per day.
Since this year we can choose for 5 types of reactions. I have given all 5 types the same colour and put them in a stacked column chart. When hoovering over the cart it is visible.


I did not write article’s so I removed that chart.

The result

Star Trek: Inspiring Culture and Technology I, Unit 4 Why Star Trek? > Imagined Technologies / Artificial Intelligence

Start Trek and Imagined Technologies

In Star Trek there was a lot of technology that was appealing to the imagination of the viewers in the ’70’s of the 20th century. Some items we have right now have been developed based on the imagination of Star Trek.

I was impressed by the extra material where James Doohan is telling about his interest in science and how he as Scotty inspired to many engineers. And the interview with Patrick Steward where he talks about Neill Armstrong’s inspiration to become an astronaut. Namely the role of Captain Picard. I knew there where fans, but that it influenced people and society so much I had not realised. While in an earlier blog I wrote down, that the role of Data maybe has inspired me to go into Data Science.

For this Starlog we got the questions:

  • “What Star Trek technology is on your list of must-haves?”
  • Could the Star Trek universe exist without this type of technology?
  • How would it be better (or worse) with (or without) this technology?

The technology on my wish list is the big touch screens and the real interactive talking with the computer. We have a start, there are touch screens and the digiboard and schools are getting more and more interactive. Also asking questions to siri, google and windows is starting off.

The Star Trek universe could not have existed without the technology. Then it would not have been SF. Without the technology there would have been something missing in the series. No talking Data, no beaming, no easily communication, no great looking screens.

Tech Showcase: Artificial Intelligence

In this model Mark Teerlink chief business strategist from IBM Watson is talking about AI and how it developed in the present time.

It would be really nice if AI would go into the direction of Data, but it also go into the direction of Lore, so we always have to stay keen on ethics with every small step we take in AI. I don’t think we should fear AI, we should fear the county’s developing AI, as long as they are humanly AI will stay humanly.

Promotion to Lieutenant Junior Grade

Challenge myself to change a set of Excel files into a Power BI report; part 3

Designing a model

To make a report I need to have information in Power BI. I can just import the Excel files and see what I have. But that is not what I have learned taking the course by Avi Singh. First I need to design a model with Data and Lookup Tables.

So what information is in my Data Table and what in my Lookup Table.

Data Table

A Data Table is the table that contains information. So in those tables I need to load the information about required and returned supplies.

Lookup Table

A Lookup Table is the table with fixed values. So in those tables I need to load the information about the weight of each article, and a calendar table, because I might need the years.

Relationship between Data and Lookup Table

In the original information I have a column with item number so I need to get that in the Data Table and in the Lookup Table with the weights of the supplies. Oeps not every item has an item number so that cannot be the key. Need to work with the English name.

The original information has no date in the worksheet, only in the file name. so I need to create a column with the year so I can make an other relationship

Work in Power BI

First step: Making a table with all the requested items.

There are 4 Excel files for each year that we have logged information. This Excel files have a lot of worksheets. From each file we only need the worksheet with the requested totals. This sheet has rows with the products items and columns with the value per booth.

Per file a made 1 table in Power BI. I used Power Query Editor to

  • Remove top blanc rows;
  • Remove rows in the file that were blanc;
  • Promoted the Headers, automatically the types of the columns were changed and I check and agree with the change;
  • Made sure my columns had the right names (the same names over the four tables);
  • Removed empty columns and columns I do not need the information from;
  • Added 1 column with the date of the event and made the Data Type: Date;
  • I unpivoted column, so that the Booth name became an Attribute column and the values became a Value column.

When I had this 4 Tables I used the Append Queries option to append this Tables into 1 Table and changed the name of the value column into Requested value.

At the start of the day, they booths receive 75% of the value of the items they have requested, so I made a new column for that.

Second step: Making a table with all the returned items.

For 2 years we also log the returns per booth to find out how much do they return at the end of the day. So next year we can advise them what to request.

These 2 files I also have put into 1 table, with the following steps

  • Remove top blanc rows;
  • Promoted the headers, automatically the types of the columns were changed and I check and agree with the change;
  • Remove rows in the file that are empty;
  • Every item is having two rows of values, in Excel Merge cells was used, in Power Query this ends up as a filled and empty row in the item column, so I used filled down as an option;
  • Removed empty columns and columns I do not need the information from;
  • Added 1 column with the date of the event and made the Data Type: Date;
  • I unpivoted column, so that the Booth name became an Attribute column and the values became a Value column.

When I had this 2 Tables I used the Append Queries option to append this Tables into 1 Table and changed the name of the value column into Returned value.

For 2 years we also have on paper what they have requested extra during the day, this is not in a file.

Third step: Making lookup tables

I made 2 lookup tables. Information about the supplies and a Calendar Table. I’m still thinking about making a table with booth information.

Fourth step: Making relationships

  • Made a relation between items required and the items lookup table;
  • Made a relation between items returned and the items lookup table;
  • For both data tables I also made a relation to the Calendar table.

Fifth step: First visualisation

In a column chart I want to show per item what they requested, what they got at start and what they returned at the end of the day. With using a filter for year and a filter for booth.

It is not working

I have been working on the above, but I came to the conclusion, that on the moment, with the knowledge I have of the Power Query Editor, I cannot shape the data into a nice table with working relationships.

The requested and what they got at the start of the day is nicely in the viz, the returns are not responding to the filter of year.

So, I go to create a new excel file which fits the requirements I need for PowerBI. I’m lucky I have access to the original data, so I can do this.

Continue in Excel

To start in Excel, I copied the Power BI table with the requested items and manually I have put in the return and while busy I looked for the paper work about the extra requested during the day. We only had 2018 so I have put those in as well.
There are some od numbers in the returns that do not make any sense. This might be explained by the fact that the counting is done at the end of the evening, by teenagers and it is done fast, because everybody wants to go home.

Back to Power BI

I have imported the new excel file and now I can make a Dashboard based on 1 table.

#MakeoverMonday; 2019 wk35; Fall is the favourite season for most Americans

#MakeOverMonday Week 35 2019

This week it is a small dataset that needs to be put in a visualization. (12 rows and 3 columns).
I first have recreated the original visualization, with the same layout.
Then I judged it at the following points

What is the original story:

While weather predictions can be unreliable, it is safe to say that a quarter of Americans are looking forward to the summer months. In a survey of the four seasons, summer rates high as 25% of Americans look forward to the warmer weather. Fall is the favourite season of the nation. They show us per season what is the favourite for each group and they conclude that fall is the favourite season of the nation. They also wrote down the percentage on each bar.

What could be improved

  • Do not use the percentage is the graph, because it does not add up to 100% in total, so the viewer starts to wonder why.
  • Show it in a different way, because it is not visible in one blink that fall has the highest value. From the standard Power BI the following options are possible.
    • Stacked Bar Chart, Stacked Column Chart, Stacked Area Chart, Ribbon Chart, Pie Chart, Donut Chart, Treemap split into the age groups
    • Bar Chart, Line Chart, Area Chart, Pie Chart, Donut Chart, Treemap without splitting into the age groups

I have chosen for the Stacked Column Chart. The Data labels are off and I also wanted to switch the tooltip off. This is to stop the confusion about the percentage. I wanted to keep the tooltip, so I made a new option in data ‘no preference’, I did this because of the fact that I’m missing data and how to fix this. This are 3 rows and it makes the total preference value per age group 100%.

Who is the audience?

It was an article on YouGov. YouGov is a British international Internet-based market research and data analytics firm. The company’s methodology involves obtaining responses from an invited group of Internet users, and then weighting these responses in line with demographic information. Reading this I conclude that they did on online survey and this is the result of the survey. The audience are people who go to this side to read what is there. So the general public.

What is the goal

It is a news article in June 2013 in the Lifestyle section.

Go to the visual

#MakeoverMonday; 2019 wk33; Practice Python analysis with a #MakeOverMonday dataset

#MakeOverMonday Week 33 2019; A bird’s-eye view of clinical trials

In this blog I have explored and visualized a dataset with Python.

Exploring the data

The dataset A bird’s-eye view of clinical trials is provided as an excel file. It is bigger than other weeks, this file has 13,748 rows and 11 columns.

First I need to know what is in the dataset and I use the .head() comment.

Then I want to know the dtypes.
There are 3 columns with numbers and 7 with text (categorical variables).

For the 3 columns with numbers I can use the function .describe(), for 2 out of 3 that will give a funny result, because it is no use to describe the ‘Start_Year’ and ‘Start_Month. The ‘Enrollment’ is the only one that can give useful answers to this request.
Now I need to get some domain knowledge, what does enrollment means? Ah found it in the text of the original visualization: Enrollment are the numbers of patients enrolled.
Result: The mean is 441 patients with a max of 84496 and a min of 0. The value 365 in the 75% quartile tells me more trails or done with less than the mean

Next up is counting the unique categories of the categorical variable’s.
For this I need to write a definition:

There are 10 different sponsors, 7 Phases, 9 different States. For in total 867 conditions/diseases.
There are no double NCT’numbers so every row is unique. There are 13.434 different titles and 13564 summaries.

Visualizing the data

Bar Charts

First, I will examine the frequency distributions of the categorical variables, ‘’Sponsor’, ‘Phase’ and ‘Status’ with bar charts. Visualizing the others with bar charts is crazy, because of the number of bars you will get.

The ‘Sponsor’ GSK has with almost 2.500 trails the highest amount of trails

Looking at the Phases, most trails have the label Phase 3

And logically the most trails have the ‘Status’ Completed.


I made histograms of the numeric variables ‘Start_Year’, ‘Start_Month’ and ‘Enrollment’.

The histogram of ‘Start_Year’ shows that in the beginning there where not much trails and that from 2000 the amount of trails went up with a peak in 2005, 2006 and 2007.

The histogram of ‘Start_Month’ shows a more or less even distribution.

The histogram of the ‘Enrollment’ shows what was already written in the describe function of the column. Most trails have a low amount of participants. So, it is right skewed.

Kernel density estimation

For ‘Start_Year’ and ‘Enrollment’ I would like to know the kde (Kernel density estimation) to find out what the density of occurrence of the trails is.
For ‘Start_Year’ the graph does not give extra insight, for ‘Enrollment’ it does.

This kde shows that there are also some large trails with a bigger group of participants.

Scatter Plot

I have made a Scatter Plot. A Scatter Plot is used to find out if there is a relationship between two variables. I wanted to see what the relationship was between ‘Enrollment’ and ‘Start_Year’. In this Scatter Plot it is visible when the bigger trails have started.

Box Plot

With a Box plot I was able to make visible when what sponsor was doing its trails.

Now I have explored the data and I can start the next step. Telling a story with visualizations in power BI.

#MakeoverMonday; 2019 wk32; Uses of 8 different energy sources to generate electricity in the UK part 2

#MakeOverMonday Week 32 2019

I have left this challenge for a week and enjoyed summer holiday season with the kids. In the meantime some people have given me suggestions looking at my earlier blog. So lets see what I can do today.

First step I have changed the table in Power BI, I now have a table with a lot more rows and lesser columns. My columns are ID, Timestamp, Attribute and Value.

The relation between the tables is automatically detected at the ID numbers. Doing a check with creating a table shows there is a mistake in this relationship. Digging for the answer is just giving me more questions about the data. So lets stop digging and create a visual with one of the story’s i wrote in my first blog.

The story I pick is the change in use over the years of 3 different types of source: Nuclear, fossil based and renewables.

  • I made two Measures in Power BI to group the sources together.
  • I have made two stacked area charts.
    • One that shows everything over the years
    • One where you can breakdown to a year / month and see the daily use.
  • I did this with the first table I had loaded earlier this week and not with the second table I made today
  • For the colours I have chosen purple for Nuclear, was temped to do red because of radiation risk, but found that to much. Of course fossil became grey, connected to the colour of coal. And green for the renewables.
  • To make sure that the filter is only for the second graph I used the option edit interactions to disconnect the filters from the first graph.
  • Kept the text in neutral grey and only made the names of the 3 groups in the text bold.
  • Both charts got fixed axis. With the automatic Y-axe it did not start at zero with the result that the bottom source was smaller than it should be. The second chart also as a fixed end, in this way the y-axe is the same for every filter that is going to be applied. I found out that February 2012 was the month where the claim of sources was the highest.

Challenge myself to change a set of Excel files into a Power BI report; part 2

Description of the Excel files

Per year there are two Excel files and every Excel files has several worksheets. From each file I need one worksheet. The worksheet with the requested and additional supplies per booth and the worksheet with the returns per booth and the total stock in the evening.

The worksheet with the returns is first having columns that describe the different supplies, followed by columns that inform the user about how much a standerd amount way’s. Followed by booth name columns. Every article has two rows of information, weight and count. We need the count by article. By adding the forms from 2018 I realised that we had changed some articles and that we do not have the weight. While making the inventory has been done with the weight. We need to come up with a solution of that, even so if we keep the old system up and running, because we need to know the count of the items coming November.

The total required is a worksheet where in columns is written what every booth wants to receive. The evening before the FoodFair the booths are receiving 75% of what they have requested. During the day they can request for additional supplies, since 2018 that is also filled in on forms, that are put in one Excel file.

Both sheets are not complete and are having missing data, lets see how that works out when I import them to the Power BI Query Editor and transform it into the right format.

Challenge myself to change a set of Excel files into a Power BI report; part 1

Description of the challenge

Every year in November there is the FoodFair of the church we are going to. My husband coordinates the supplies room and he has a set of Excel files (with formulas) to know how much supplies he uses a year, how much is left and how much he needs to buy again.

There are several booths with different food and different needs of supplies

At the end of the day we are left with a bunch of handwritten forms that tell us how many supplies have been given out and returned per booth. This handwritten forms go into the Excel file. And we do have a handwritten form with counted stock at the end of the evening. This form is also copied into a Excel file.

I gave myself the challenge to build a model from the Excel files that can be used on a yearly base and to make a report that shows what supplies are requested for and what is been given back per booth

As a by the way, writing this blog also teaches me more about possibilities in WordPress, because I want more than just a plain black text.

Start of the project

I see this as a real use case, because there is data in Excel in a reader friendly way, so that needs to be converted. The data is in several different files and collected by hand over more than one year.

I have domein knowledge, because I have been helping my husband with inputting data in the files and working with him during the FoodFair day’s.

The data that is there in Excel are two files a year. 1 file has the returns per booth and the stock at the end of the day. The other file has the requested supplies per booth. And there is a lot more information in the files on other sheets.

The forms from 2018 have not been put into Excel yet, so that is what needs to be done first.

My learning path Power BI part 1

The learning path Power BI started in april 2017 when I did my first Analyzing and Visualizing Data with Power BI online training in edX. At that moment my passion for Power BI was born.

Overtime I have been experimenting with Power BI and during my Data Science Professional Microsoft Track powered by Techionista learned more about Power BI.

In the past weeks I have been busy with exploring Power BI, by just diving into #MakeOverMonday challenges and working with trial and error. This way of working helped me to understand the program, but it also fed my hunger in getting to know more about Power BI.
In my search for video material I came across Avin Singh, he has a nice video series where he starts from the beginning with introducing Power BI. In this blog are my learning notes. I do advise you to watch the video series to understand Power BI better.

Before I started with this series I started with the most basic one “how to install Power PI”.
In this video he talks about the two way’s to install Power BI

  • From the Microsoft website
  • From the Microsoft store

My Power BI was installed from the Microsoft website and the plus point that you do not have to update it every month yourself, made me decide to deinstall that version and to install Power BI from the store. Within an hour I went back to the website version. I wanted to change something in the settings and I could not. I kept giving me a popup error message. I did not want to figure out way and decide that the website version and updating it manually every month work well in the past and I would keep it like that.

In Get Data he explains how to import data and how you can edit it with the query editor and how the query editor helps you with making a documentation so that somebody else (or yourself after some time) knows what has happened. Good documentation is important in Data Science, because in that why you can explain and prove to others what you have done.

In Relationships he tells you how to make a good data model. He explains about data tables and fact tables.

Up to know repeating of the knowledge I got from earlier training and self-exploring of Power BI.

In the section about DAX I get to learn some new facts.
A calculated column is a nice way for the human brain to see in the tables what is happening. It makes the file grow bigger in size.
We want to have a small file, because a Power BI file runs in memory of the computer.
The storage of a big file is not a problem, running a big file in memory can give a problem.
So it is better to use a measure. This does not make the file bigger. To write them you need to have more DAX knowledge. In his video’s he explains the basics.

There are two types of measures “Implicit” and “Explicit”.
Implicit means you take the column from the table and put it in the visualisation.
Explicit measure is a measure you made by using a DAX-formula.
He explains them and tells witch one is better to be used. To make that clear he really gives a nice picture why you are building a Power BI file.

The Data Scientist is the author from an Excel or a Power BI file, that file is being published in the cloud of Power BI and becomes the only file of truth. That file can be made excisable for consumers to use the data with the program they want.
And at that point it becomes very important if you have use implicit or explicit measures. Implicit measure can not be put in a pivot table in Excel, while explicit can.
A lot of users use Excel, so you need to build for their needs.

He tells a lot more about DAX measures and I start to understand it.

At the end he creates a report, that turns out to be a dashboard and online published. I did make the report he made, but did not publish it. I also changed how to put the text and images. I first made a jpg by using PowerPoint and loaded that as the background of the report (tip from Marc Lelijveld during a webinar I followed in July 2019)

Here is the created visual.

#MakeoverMonday; 2019 wk28; Asylum applications in Europa part 4

#MakeOverMonday Week 28 2019

Go to the power BI viz pag 3

In the facebook group I got feedback on iteration 2

Loads better! I felt part of the conversation and knew immediately what story you were trying to tell!
Fantastic job! Great blog post! I hope others are encouraged to try this after seeing your journey!


  1. Is there a legend for the first two graphs?
  2. Do they need to be multicolored or could you make them all the same color and then call out the years you want to point out in a highlight color?

My answers

  1. There is no legend for the first two graphs, if I give one it will be a list of 28 countries and that is a lot.
  2. Looking at answer 1 in combination with question 2 I’m going to make a stacked column cart with 3 layers, 26 countries together and the 2 that I’m talking about separated and than I can also make a legend for the two graphs.

How did I do this.
I wanted to make a measure in Power BI, where I calculate the 26 rows and do not calculate the values of Germany and Hungary. I could not get it done with a DAX formula, so I went back to the source, the excel file and made a new row with the right values and imported the file as a new file.

Who can tell me how to do it with a DAX formula?


Another graph that might be interesting is the variance between accepted and denied which you might be able to use a waterfall chart to show in Power BI.

I did not start working on that one yet.

#MakeoverMonday; 2019 wk28; My blog part of Power BI news

Today I got a message from twitter that there was a post with my name so I went to it. I was wondering how did I contribute to this news. So, I followed up the link and found a great website with a nice collection of Power BI news.

I started reading and at when I was at the topic “Business” I found a my blog about the asylum applications in Europa part 3.

I feel proud about this.

#MakeoverMonday; 2019 wk28; Asylum applications in Europa part 3

#MakeOverMonday Week 28 2019

Go to the power BI viz pag 1

Today and yesterday I have been working on a second report with the data given by #MakeOverMonday.

My biggest challenge was getting the data in the right format to create a normal line or bar chart, with the years on the x-as and the country’s being the values. I dove into one of the other reports and found out, that the format of the data had been changed. The years where no longer in separated columns, but moved into 1 column and that is what was in my mind also. So first I worked in Excel and created the format needed to make the report I wanted. It takes time to put it in the right format.

Lesson learned: if I feel the data is not in the right format, change it first before putting time in making a report and than need to restart again.

While I was busy in Excel I was like, this most be possible in Power BI with the Power Query Editor. So I went back to Power BI and have been reading carefully and found the option: Unpivot Columns in the Transform menu and it did the trick in a click

After the data was in power BI in the format, I wanted I did some cleaning of the data

  • Removing empty rows
  • Chancing the value “:” into an empty cell. This value was used several times, I did not mean 0 because there where also zero’s in the columns.
  • Changed “Germany (until 1990 former territory of the FRG)” into Germany, because all data was from 2009 and onwards.
  • Filtered out the information about the 28 European countries together
  • Changed “Total positive decisions” into “accepted”, because that fitted nicer in the graph of denied and accepted

This time I have build a 3 visual report

Visual 1 shows the total asylum requests in Europe over the year by country.
I noticed a peak in Hungary in 2015 and dove into news articles to find out when Hungary closed the borders for Middle east refugees with Croatia and read this was end 2015. So this news fact is visible in the graph.

Visual 2 shows how many decisions have been made. In Germany you can see that a request and a decision is not always in the same year.

Visual 3 shows the accepted and denied request. I have chosen for a clustered column chart instead of a stacked column chart. Because here it is easier to see what has been more.

For the 3 visuals I made the Y-ax the same scale.

#MakeoverMonday; 2019 wk28; Asylum applications in Europa part 2

#MakeOverMonday Week 28 2019

Feedback from #Make Over Monday review

This Wednesday there was the weekly review from the made visualization and here are the points I need to work on and be aware of for a next report

  • Make sure there is a title / header
  • Check the spelling of the tekst
  • When mentioned it is about people you do not have to specify that those are divided into two gender groups
  • The filter on the left hand side is big, better is to make it a dropdown list
  • The tables are not inviting to read them, remove them are use a kind of heatmap or only totals
  • Use a normal bar chart and put the number is full amounts, do not use K, because there are amounts below 1,000

Feedback from members of the facebook group BI Data Storytelling Mastery

  • What is the takeaway?
  • What story should I see?
  • I suggest considering something different than a stacked bar. I can see overall total but understanding difference in composition is hard for human brain.
  • In your blog can you tell why you chose your visuals, colors, fonts?
  • But most importantly the story you want to tell. Better story might be approved vs denied, overall trend. Is there a story in the reasons? Is there a story in the sex or age difference? Why should the viewer care about these numbers? Help us see the story from your point of view, the refugees point of view, the asylum counties pov.
  • Can you make the viewer want to investigate and find out more?

So I’m going back to an empty power BI sheet and rethink about the story that I want to tell with this data.

#MakeoverMonday; 2019 wk28; Asylum applications in Europa part 1

#MakeOverMonday Week 28 2019

Go to the power BI viz pag 2

This week we are working with data about asylum applications in Europa from 2008 until 2018. The data provided is part of a bigger dashboard. You can find the original at “International protection in the EU+: 2018 overview”. We received two datasets that was used for this two visuals

Looking at the data in Excel, the Age column is only giving total and not a differentiation in minor or adult. The decision column is also giving 4 other descriptions than the 4 used in the original graph. The two columns with totals are not giving the same results so it is not possible to combine the two sets. So not possible to find out how many of the applicant per year got asylum and what type of asylum. This is probably because of the length application.

I decided to make a visualisation where you can filter by country how many people are asking asylum and what the decision are by year. I could not connect the request and decision numbers to each other.

#MakeoverMonday; 2019 wk26; Alcohol Consumption By Country

#MakeOverMonday Week 26 2019

This week I want to participate in the weekly challenge of #MakeOverMonday lets see if I have enough spare time between jobapplications, family activities and Python for Data Science training. The last two weeks I could not find the time for the challenge.

This week it is a small dataset. It has 25 rows (country’s) and there alcohol consumption by capita. In the article I read that is by capita older than 15 years.
In the original visualization there is a bar chart of this 15 country’s.
On Wednesday I have seen the live webinar and was impressed with visuals other people made. Got a lot of inspiration and challenge to find out what I can do.

In my head grows the idea to at the capita by country, to show how many liters there is drunk by country. I found information about the capita per country for 2017 and have added this to the excel spreadsheet. Now I need to find how much of the population is above 15 years old.
No could not find it, so end of idea.

I just used the data I had a created some graphs showing what is in the data. The information I found about the capita by country in 2017 has been used.

I did submit it to #MakeOverMonday Challenge.

Next training step

While busy looking for a new employer I also have time to do some more DataScience training. I went back to my trusted platform and decided to start with a MicroMasters® Program in Data Science at UCSanDiegoX.

The first course is called Python for Data Science and has 8 modules, there will be cases with data on Kaggle.

The first module reviews what data science is and how to conduct data science research. It is a recap of what I have learned in the other models. Nice to get a recap and connect dots.

I’m taking this training because I want to practise more with data modeling and predicting results (Machine Learning)

Eerste dag Business Case of the Microsoft Azure Academy for Data Science, powered by Techionista.

Vandaag van start met de Business Case. Vijf bedrijven hebben vijf groepen een opdracht gegeven om mee aan de slag te gaan. Onze groep gaat van start met de aanwezigheid van supporters tijdens de thuis wedstrijden van Ajax in de Johan Cruijf Arena.

Data Science Track van Microsoft afgerond

Yes vandaag de certificering ontvangen als bewijs dat ik alle 11 onderdelen succesvol heb afgrond.

Begonnen op 1 februari 2019, afgerond op 30 april 2019, hieronder een overzicht van het aantal uren dat ik er in gestoken heb. De reistijden zijn niet verwerkt. De training vond 3 dagen per week plaats in de Amsterdam ArenA en de rest was thuis studie. De reistijd naar de Amsterdam ArenA is 3,5 uur per dag.

Het onderstaande rapport is een interactief rapport in PowerBI

De onderdelen die ik nu in theorie beheers zijn: Introduction to Data Science, Power BI, Analytics Storytelling for Impact, Ethics and Law in Data and Analytics, Querying Data with Transact-SQL, Introduction to Python for Data Science, Essential Math for Machine Learning: Python Edition, Data Science Research Methods: Python Edition, Principles of Machine Learning: Python Edition, Developing Big Data Solutions with Azure Machine Learning, Microsoft Professional Capstone : Data Science

Studie update

Vandaag Module 4 in de Python training afgerond.

Ik heb nu de theorie doorgelezen van:

  • Lists, subsetting and manipulating lists
  • Functions, Methods & Packages
  • Numpy, 2D Numpy arrays & Basic Statistics with Numpy

Tijd om pauze te houden en het in te laten zakken, morgen weer verder.

LinkedIn update

Whauw what a day, we got information about the products Microsoft has to do Data Science and about the backbone from the Azure Stack. We saw some gadgets. It was an interesting day.

#MakeoverMonday; Found an other interesting and challenging website

Today when I was looking around at a visual made at a previous challenge from #SWDchallenge I passed by an other challenging website: Makeover Monday there they request you to make a challenge on a weekly base and share it with #makeovermonday, So I now have two options to start practicing my storytelling and visualization skills in PowerBI

#SWDChallenge; This looks interesting and challenging

Today I was reading a post on LinkedIn and thought, this looks interesting and challenging. It was a post about storytelling with data and they challenge people to do a practice making a visualisation in the first week of the month and to share it. I’m thinking of taking up this challenge this month, see how far I can get, with the knowledge I have right now and my available time.

LinkedIn update

Today I finished the training Ethics and Law in Data and Analytics. It is not really a topic you would think about when starting a training to become a Data Scientist. Now at the end of the course I’m like yes this is a topic you need to be aware of when being a Data Scientist.

Amsterdam region has had a week spring break

Last week was an easy week in the learning proces. We did not have to come to the study location, so I only have been studying at home, next week the two day’s that are planned to study at home I cannot study, because my children are having there spring break.

LinkedIn update

Today was my 12th (weekday) in this study track and I finished the course Analyze and visualize data with PowerBI, looking forward to learn more about the tool PowerBI and to be able to work with it.

Start of the track becomming a Data Scientist

In this blog I want to keep track on how I started my first visibel steps in the world of Data Science.
In the past I already had undertaken actions that all have made me ready for this big step. Starting the training at Techionista: Becoming a Data Scientist with a certificate from Microsoft

Today January 28 2019 I started with the first track in the training: Introduction to Data Science