Practice Python analysis with a #MakeOverMonday dataset

#MakeOverMonday Week 33 2019; A bird’s-eye view of clinical trials

In this blog I have explored and visualized a dataset with Python.

Exploring the data

The dataset A bird’s-eye view of clinical trials is provided as an excel file. It is bigger than other weeks, this file has 13,748 rows and 11 columns.

First I need to know what is in the dataset and I use the .head() comment.

Then I want to know the dtypes.
There are 3 columns with numbers and 7 with text (categorical variables).

For the 3 columns with numbers I can use the function .describe(), for 2 out of 3 that will give a funny result, because it is no use to describe the ‘Start_Year’ and ‘Start_Month. The ‘Enrollment’ is the only one that can give useful answers to this request.
Now I need to get some domain knowledge, what does enrollment means? Ah found it in the text of the original visualization: Enrollment are the numbers of patients enrolled.
Result: The mean is 441 patients with a max of 84496 and a min of 0. The value 365 in the 75% quartile tells me more trails or done with less than the mean

Next up is counting the unique categories of the categorical variable’s.
For this I need to write a definition:

There are 10 different sponsors, 7 Phases, 9 different States. For in total 867 conditions/diseases.
There are no double NCT’numbers so every row is unique. There are 13.434 different titles and 13564 summaries.

Visualizing the data

Bar Charts

First, I will examine the frequency distributions of the categorical variables, ‘’Sponsor’, ‘Phase’ and ‘Status’ with bar charts. Visualizing the others with bar charts is crazy, because of the number of bars you will get.

The ‘Sponsor’ GSK has with almost 2.500 trails the highest amount of trails

Looking at the Phases, most trails have the label Phase 3

And logically the most trails have the ‘Status’ Completed.

Histograms

I made histograms of the numeric variables ‘Start_Year’, ‘Start_Month’ and ‘Enrollment’.

The histogram of ‘Start_Year’ shows that in the beginning there where not much trails and that from 2000 the amount of trails went up with a peak in 2005, 2006 and 2007.

The histogram of ‘Start_Month’ shows a more or less even distribution.

The histogram of the ‘Enrollment’ shows what was already written in the describe function of the column. Most trails have a low amount of participants. So, it is right skewed.

Kernel density estimation

For ‘Start_Year’ and ‘Enrollment’ I would like to know the kde (Kernel density estimation) to find out what the density of occurrence of the trails is.
For ‘Start_Year’ the graph does not give extra insight, for ‘Enrollment’ it does.

This kde shows that there are also some large trails with a bigger group of participants.

Scatter Plot

I have made a Scatter Plot. A Scatter Plot is used to find out if there is a relationship between two variables. I wanted to see what the relationship was between ‘Enrollment’ and ‘Start_Year’. In this Scatter Plot it is visible when the bigger trails have started.

Box Plot

With a Box plot I was able to make visible when what sponsor was doing its trails.

Now I have explored the data and I can start the next step. Telling a story with visualizations in power BI.

Uses of 8 different energy sources to generate electricity in the UK part 2

#MakeOverMonday Week 32 2019

I have left this challenge for a week and enjoyed summer holiday season with the kids. In the meantime some people have given me suggestions looking at my earlier blog. So lets see what I can do today.

First step I have changed the table in Power BI, I now have a table with a lot more rows and lesser columns. My columns are ID, Timestamp, Attribute and Value.

The relation between the tables is automatically detected at the ID numbers. Doing a check with creating a table shows there is a mistake in this relationship. Digging for the answer is just giving me more questions about the data. So lets stop digging and create a visual with one of the story’s i wrote in my first blog.

The story I pick is the change in use over the years of 3 different types of source: Nuclear, fossil based and renewables.

  • I made two Measures in Power BI to group the sources together.
  • I have made two stacked area charts.
    • One that shows everything over the years
    • One where you can breakdown to a year / month and see the daily use.
  • I did this with the first table I had loaded earlier this week and not with the second table I made today
  • For the colours I have chosen purple for Nuclear, was temped to do red because of radiation risk, but found that to much. Of course fossil became grey, connected to the colour of coal. And green for the renewables.
  • To make sure that the filter is only for the second graph I used the option edit interactions to disconnect the filters from the first graph.
  • Kept the text in neutral grey and only made the names of the 3 groups in the text bold.
  • Both charts got fixed axis. With the automatic Y-axe it did not start at zero with the result that the bottom source was smaller than it should be. The second chart also as a fixed end, in this way the y-axe is the same for every filter that is going to be applied. I found out that February 2012 was the month where the claim of sources was the highest.

New profile photo LinkedIn and Twitter

Today I spend some time working on a new profile photo for LinkedIn and Twitter and I’m happy with the result.

I started with this two photo’s

This is a photo of the Westduinen, it was the ‘backyard’ I grew up with.

Photo taken by UWV in June 2019

I removed the background of my profile photo and cut out the top half to put it into the photo with the grass. It work out really nice and I got this profile photo

Uses of 8 different energy sources to generate electricity in the UK part 1

#MakeOverMonday Week 32 2019

For this week we received a bigger dataset, so first of all, lets do some exploring with Python code for the practice.

The dataset has 796,453 rows and 12 columns. Mmm something to be aware of, 11 of the columns have a name that start with a space. So, if I want to refer to them, I need to use that space as well.

The information is from 01-01-2012 to 03-08-2019, so on charts you need to be aware that the sum om 2019 is not comparable with the sums of other years.

There are 8 types of energy sources in this dataset:

  • Biomass: run on imported timber or use sawmill waste.
  • CCGT: Combined Cycle Gas Turbines are gas turbines whose hot exhaust are used to drive a boiler and steam turbine. This two-stage process makes them very efficient in gas usage. They are also quite fast to get online, so they are used to cover peak demand and to balance wind output.
  • Coal
  • Hydro
  • Nuclear
  • Pumped: These are small hydro-electric stations that can use overnight electricity to recharge their reservoirs. Mainly used to meet very short-term peak demands.
  • Solar
  • Wind

The other columns are an ID number, a timestamp, the demand (the total Giga Watt demand of the entire UK) and the frequency (the grid frequency is controlled to be exactly 50Hz on average, but varies slightly)

The Frequency and the Solar columns are decimal numbers the others are whole numbers and the timestamp is an object.

There are no missing values at all. There must be some doubles, looking at the timestamp, because there are 796,401 unique timestamps in the 796,453 rows. But 52 rows on the whole dataset can be negligible. Every row did get an unique ID, because here I see the same row count as rows in the dataset.

I used the Quick Insights option of Power BI and got insights that do not tell me anything. So I started to make some myself.

First I looked at the uses by year (2012-2018) for each source.

  • Coal goas down
  • Nuclear stays around the same level
  • CCGT grew up to 2016 and now flattens out on the downside.
  • Wind is going up since 2016
  • Pumped, goes down since 2016
  • Hydro show some fluctuation, but stay’s around its average
  • Biomass: uses going up
  • Solar is showing an intense growth in 2017 and it stay’s high in 2018 and that growth is so much that when you plot it in a line chart with coal, with had the highest use in 2012 that the coal line goes flat in the bottom of the chart, so I do not consider the information in the solar column of a good quality.

The 3 mean sources in 2012 where Coal, CCGT and Nuclear and in 2018 that is CCGT, Nuclear and Wind.

The demand on GW has gone down over the years with 724,444,951 GW

Found insights, did not yet find a story to tell. Lets take some rest and continue an other day.

Challenge myself to change a set of Excel files into a Power BI report; part 2

Description of the Excel files

Per year there are two Excel files and every Excel files has several worksheets. From each file I need one worksheet. The worksheet with the requested and additional supplies per booth and the worksheet with the returns per booth and the total stock in the evening.

The worksheet with the returns is first having columns that describe the different supplies, followed by columns that inform the user about how much a standerd amount way’s. Followed by booth name columns. Every article has two rows of information, weight and count. We need the count by article. By adding the forms from 2018 I realised that we had changed some articles and that we do not have the weight. While making the inventory has been done with the weight. We need to come up with a solution of that, even so if we keep the old system up and running, because we need to know the count of the items coming November.

The total required is a worksheet where in columns is written what every booth wants to receive. The evening before the FoodFair the booths are receiving 75% of what they have requested. During the day they can request for additional supplies, since 2018 that is also filled in on forms, that are put in one Excel file.

Both sheets are not complete and are having missing data, lets see how that works out when I import them to the Power BI Query Editor and transform it into the right format.

Challenge myself to change a set of Excel files into a Power BI report; part 1

Description of the challenge

Every year in November there is the FoodFair of the church we are going to. My husband coordinates the supplies room and he has a set of Excel files (with formulas) to know how much supplies he uses a year, how much is left and how much he needs to buy again.

There are several booths with different food and different needs of supplies

At the end of the day we are left with a bunch of handwritten forms that tell us how many supplies have been given out and returned per booth. This handwritten forms go into the Excel file. And we do have a handwritten form with counted stock at the end of the evening. This form is also copied into a Excel file.

I gave myself the challenge to build a model from the Excel files that can be used on a yearly base and to make a report that shows what supplies are requested for and what is been given back per booth

As a by the way, writing this blog also teaches me more about possibilities in WordPress, because I want more than just a plain black text.

Start of the project

I see this as a real use case, because there is data in Excel in a reader friendly way, so that needs to be converted. The data is in several different files and collected by hand over more than one year.

I have domein knowledge, because I have been helping my husband with inputting data in the files and working with him during the FoodFair day’s.

The data that is there in Excel are two files a year. 1 file has the returns per booth and the stock at the end of the day. The other file has the requested supplies per booth. And there is a lot more information in the files on other sheets.

The forms from 2018 have not been put into Excel yet, so that is what needs to be done first.

My learning path Power BI part 1

The learning path Power BI started in april 2017 when I did my first Analyzing and Visualizing Data with Power BI online training in edX. At that moment my passion for Power BI was born.

Overtime I have been experimenting with Power BI and during my Data Science Professional Microsoft Track powered by Techionista learned more about Power BI.

In the past weeks I have been busy with exploring Power BI, by just diving into #MakeOverMonday challenges and working with trial and error. This way of working helped me to understand the program, but it also fed my hunger in getting to know more about Power BI.
In my search for video material I came across Avin Singh, he has a nice video series where he starts from the beginning with introducing Power BI. In this blog are my learning notes. I do advise you to watch the video series to understand Power BI better.

Before I started with this series I started with the most basic one “how to install Power PI”.
In this video he talks about the two way’s to install Power BI

  • From the Microsoft website
  • From the Microsoft store

My Power BI was installed from the Microsoft website and the plus point that you do not have to update it every month yourself, made me decide to deinstall that version and to install Power BI from the store. Within an hour I went back to the website version. I wanted to change something in the settings and I could not. I kept giving me a popup error message. I did not want to figure out way and decide that the website version and updating it manually every month work well in the past and I would keep it like that.

In Get Data he explains how to import data and how you can edit it with the query editor and how the query editor helps you with making a documentation so that somebody else (or yourself after some time) knows what has happened. Good documentation is important in Data Science, because in that why you can explain and prove to others what you have done.

In Relationships he tells you how to make a good data model. He explains about data tables and fact tables.

Up to know repeating of the knowledge I got from earlier training and self-exploring of Power BI.

In the section about DAX I get to learn some new facts.
A calculated column is a nice way for the human brain to see in the tables what is happening. It makes the file grow bigger in size.
We want to have a small file, because a Power BI file runs in memory of the computer.
The storage of a big file is not a problem, running a big file in memory can give a problem.
So it is better to use a measure. This does not make the file bigger. To write them you need to have more DAX knowledge. In his video’s he explains the basics.

There are two types of measures “Implicit” and “Explicit”.
Implicit means you take the column from the table and put it in the visualisation.
Explicit measure is a measure you made by using a DAX-formula.
He explains them and tells witch one is better to be used. To make that clear he really gives a nice picture why you are building a Power BI file.

The Data Scientist is the author from an Excel or a Power BI file, that file is being published in the cloud of Power BI and becomes the only file of truth. That file can be made excisable for consumers to use the data with the program they want.
And at that point it becomes very important if you have use implicit or explicit measures. Implicit measure can not be put in a pivot table in Excel, while explicit can.
A lot of users use Excel, so you need to build for their needs.

He tells a lot more about DAX measures and I start to understand it.

At the end he creates a report, that turns out to be a dashboard and online published. I did make the report he made, but did not publish it. I also changed how to put the text and images. I first made a jpg by using PowerPoint and loaded that as the background of the report (tip from Marc Lelijveld during a webinar I followed in July 2019)

Here is the created visual.

Asylum applications in Europa part 4

#MakeOverMonday Week 28 2019

Go to the power BI viz pag 3

In the facebook group I got feedback on iteration 2

Loads better! I felt part of the conversation and knew immediately what story you were trying to tell!
Fantastic job! Great blog post! I hope others are encouraged to try this after seeing your journey!

Questions

  1. Is there a legend for the first two graphs?
  2. Do they need to be multicolored or could you make them all the same color and then call out the years you want to point out in a highlight color?

My answers

  1. There is no legend for the first two graphs, if I give one it will be a list of 28 countries and that is a lot.
  2. Looking at answer 1 in combination with question 2 I’m going to make a stacked column cart with 3 layers, 26 countries together and the 2 that I’m talking about separated and than I can also make a legend for the two graphs.

How did I do this.
I wanted to make a measure in Power BI, where I calculate the 26 rows and do not calculate the values of Germany and Hungary. I could not get it done with a DAX formula, so I went back to the source, the excel file and made a new row with the right values and imported the file as a new file.

Who can tell me how to do it with a DAX formula?

Suggestion

Another graph that might be interesting is the variance between accepted and denied which you might be able to use a waterfall chart to show in Power BI.

I did not start working on that one yet.

My blog part of Power BI news

Today I got a message from twitter that there was a post with my name so I went to it. I was wondering how did I contribute to this news. So, I followed up the link and found a great website with a nice collection of Power BI news.

I started reading and at when I was at the topic “Business” I found a my blog about the asylum applications in Europa part 3.

I feel proud about this.

Asylum applications in Europa part 3

#MakeOverMonday Week 28 2019

Go to the power BI viz pag 1

Today and yesterday I have been working on a second report with the data given by #MakeOverMonday.

My biggest challenge was getting the data in the right format to create a normal line or bar chart, with the years on the x-as and the country’s being the values. I dove into one of the other reports and found out, that the format of the data had been changed. The years where no longer in separated columns, but moved into 1 column and that is what was in my mind also. So first I worked in Excel and created the format needed to make the report I wanted. It takes time to put it in the right format.

Lesson learned: if I feel the data is not in the right format, change it first before putting time in making a report and than need to restart again.

While I was busy in Excel I was like, this most be possible in Power BI with the Power Query Editor. So I went back to Power BI and have been reading carefully and found the option: Unpivot Columns in the Transform menu and it did the trick in a click

After the data was in power BI in the format, I wanted I did some cleaning of the data

  • Removing empty rows
  • Chancing the value “:” into an empty cell. This value was used several times, I did not mean 0 because there where also zero’s in the columns.
  • Changed “Germany (until 1990 former territory of the FRG)” into Germany, because all data was from 2009 and onwards.
  • Filtered out the information about the 28 European countries together
  • Changed “Total positive decisions” into “accepted”, because that fitted nicer in the graph of denied and accepted

This time I have build a 3 visual report

Visual 1 shows the total asylum requests in Europe over the year by country.
I noticed a peak in Hungary in 2015 and dove into news articles to find out when Hungary closed the borders for Middle east refugees with Croatia and read this was end 2015. So this news fact is visible in the graph.

Visual 2 shows how many decisions have been made. In Germany you can see that a request and a decision is not always in the same year.

Visual 3 shows the accepted and denied request. I have chosen for a clustered column chart instead of a stacked column chart. Because here it is easier to see what has been more.

For the 3 visuals I made the Y-ax the same scale.

Asylum applications in Europa part 2

#MakeOverMonday Week 28 2019

Feedback from #Make Over Monday review

This Wednesday there was the weekly review from the made visualization and here are the points I need to work on and be aware of for a next report

  • Make sure there is a title / header
  • Check the spelling of the tekst
  • When mentioned it is about people you do not have to specify that those are divided into two gender groups
  • The filter on the left hand side is big, better is to make it a dropdown list
  • The tables are not inviting to read them, remove them are use a kind of heatmap or only totals
  • Use a normal bar chart and put the number is full amounts, do not use K, because there are amounts below 1,000

Feedback from members of the facebook group BI Data Storytelling Mastery

  • What is the takeaway?
  • What story should I see?
  • I suggest considering something different than a stacked bar. I can see overall total but understanding difference in composition is hard for human brain.
  • In your blog can you tell why you chose your visuals, colors, fonts?
  • But most importantly the story you want to tell. Better story might be approved vs denied, overall trend. Is there a story in the reasons? Is there a story in the sex or age difference? Why should the viewer care about these numbers? Help us see the story from your point of view, the refugees point of view, the asylum counties pov.
  • Can you make the viewer want to investigate and find out more?

So I’m going back to an empty power BI sheet and rethink about the story that I want to tell with this data.

Asylum applications in Europa part 1

#MakeOverMonday Week 28 2019

Go to the power BI viz pag 2

This week we are working with data about asylum applications in Europa from 2008 until 2018. The data provided is part of a bigger dashboard. You can find the original at “International protection in the EU+: 2018 overview”. We received two datasets that was used for this two visuals

Looking at the data in Excel, the Age column is only giving total and not a differentiation in minor or adult. The decision column is also giving 4 other descriptions than the 4 used in the original graph. The two columns with totals are not giving the same results so it is not possible to combine the two sets. So not possible to find out how many of the applicant per year got asylum and what type of asylum. This is probably because of the length application.

I decided to make a visualisation where you can filter by country how many people are asking asylum and what the decision are by year. I could not connect the request and decision numbers to each other.

Alcohol Consumption By Country

#MakeOverMonday Week 26 2019

This week I want to participate in the weekly challenge of #MakeOverMonday lets see if I have enough spare time between jobapplications, family activities and Python for Data Science training. The last two weeks I could not find the time for the challenge.

This week it is a small dataset. It has 25 rows (country’s) and there alcohol consumption by capita. In the article I read that is by capita older than 15 years.
In the original visualization there is a bar chart of this 15 country’s.
On Wednesday I have seen the live webinar and was impressed with visuals other people made. Got a lot of inspiration and challenge to find out what I can do.

In my head grows the idea to at the capita by country, to show how many liters there is drunk by country. I found information about the capita per country for 2017 and have added this to the excel spreadsheet. Now I need to find how much of the population is above 15 years old.
No could not find it, so end of idea.

I just used the data I had a created some graphs showing what is in the data. The information I found about the capita by country in 2017 has been used.

I did submit it to #MakeOverMonday Challenge.

Next training step

While busy looking for a new employer I also have time to do some more DataScience training. I went back to my trusted platform edX.org and decided to start with a MicroMasters® Program in Data Science at UCSanDiegoX.

The first course is called Python for Data Science and has 8 modules, there will be cases with data on Kaggle.

The first module reviews what data science is and how to conduct data science research. It is a recap of what I have learned in the other models. Nice to get a recap and connect dots.

I’m taking this training because I want to practise more with data modeling and predicting results (Machine Learning)

Data about sleeping hours in America

#MakeOverMonday Week 23 2019

Introduction

From MakeOverMonday I got data about sleep times per day, age and sex. With the question to work on the graph and remake it. This week I first took the data into Jupyter Notebook to analyse it with Python, after that I went to Power BI for the visualization.

Exploring the data

The data is having 945 rows with 8 columns
There is no missing data
It is information about 15 years (2003-2017)
The people are put in 7 age groups and there is a group with all the information together. They are also divided in 3 sex groups (both, men and women)
There are 2 different types of day’s (“Nonholiday weekdays” and “Weekend days and holidays”) and there is a group were all information has been put together.
In Python I left the data like it was, I did not alter it.

I made several histograms, to see what they were telling me. The histogram of Average hours per day sleeping gave me a nice right skewed distribution (0.4661909713080754)
The mean of this one is 8.069 and the median is 8.81

The correlation between the year and the Average hours per day sleeping is 0.15, this is small

I want to know if there is a relation between ‘Average hours per day sleeping’ and the categorical variables
I found something as can been seen in the two boxplots, it looks like 15 to 24 year old people sleep more hours

During the Weekend days and holidays, people sleep more hours

This is the graph how it was made to remake

https://www.bls.gov/tus/charts/chart16.jpg

Visualize the data in Power BI

I did not find it easy to pick out a topic to visualize, there was not really something that got my attention during explorating the date, beside that I found it a lot of hours that people spend sleeping. I think the title should be, hours spend in bed.

I decided to visualize the average of hours spend in bed by age group over the years, with the possibility to filter on man of women.

Unit 2: Why Star Trek? > Star Trek and the Business of Cable Television Starlog

For this Unit we have been watching the pilots of all the series. Here is how we rank them and what we think of them.

3“The Cage.”
Star Trek: The Original Series
running: 1966-1969
playing 2265
remark: Shown on Network Television; dominant male episode. Number 1 is female, but not seen as female by Captien Pike, other head players are male, it is that I have seen more StarTrek, because this pilot is not talking to me and I’m not interested to watch a second episode.
2“Where No Man Has Gone Before.”
Star Trek: The Original Series
running: 1966-1969
playing 2265
remark: First episode with Captien Kirk, still mainly male, beter then The Cage.
1“Encounter at Far Point.”
Star Trek: The Next Generation
running: 1987-1994
playing 2365
remark: In this pilot the new enterpice and his crew members are introduced to us and how they knew each other in the past / what the relations are.
We also get introduceted to Q and how the Q think about people, people need to prove otherwise.
6“Emissary.”
Star Trek: Deep Space 9
running 1993-1999
playing: 2365
remark: The beginning of a new job, but first closing old life and grive. Again dealing with an other life form the thinks human is crual.
4“Caretaker.”
Star Trek: Voyager
running 1995-2001
playing 2365
remark: A leap in time and space, we jump 75.000 light years because a caretaker wants to pay for his depts to a planet. The caretaker dies and technology destroyed.
In the mean time the Voyager and there enemy form home have to become friends to travel back home for 75 years. Two people from far away want to leave far away and travel to earth
5“Broken Bow.”
Star Trek: Enterprise
running 2001-2005
playing 2151
remark: Nice to see a serie made in 2001 but is playing before Star Trek the Original Series, you see older machinery but the vesion is newer 😉
“The Vulcan Hello.”
Star Trek: Discovery
running since 2017
playing 2256
We watched 3 episodes, it is adictiev, but not in the line of what was in the old series. Personally I prefer the older series
I did not see that this serie play’s before The next generation, it looks that is play’s. From the Klingons plot I got the idea is was behnd Voyager

Make Over Monday Week 21 iteration 2

After the review by Eva Murray, Head of BI & Tableau Zen Master at Exasol and Jeff Shaffer, COO & Vice President, Unifund, Tableau Zen Master I made an iteration of my visualisation.

The suggestions where:

  • About the title; not to put information there that is not connected with the dataset
  • To change the bar chart into a column chart
  • To show the month by its abbreviation.
  • To make sure the scale does not change when using a filter (so fixed x and y axis)
    • This was a challenge, but I made it, YES!!
  • Gramma; use the word from instead of in (July)
  • and highlight the most fatal months

The interaction with the charts and the filters was seen as positive thing

Second time Makeover Monday

After two weeks of the Business Case from Techionista it is time to do a MakeOverMonday challenge again.

Today we are asked to work with data from Ali Sanne, she collected, prepared and distributed the data on data.world. It is data about deadly bear attacks on people. From 1900 till 2018.

Read the original report at fox.

My visualisation in Power BI

Eerste dag Business Case of the Microsoft Azure Academy for Data Science, powered by Techionista.

Vandaag van start met de Business Case. Vijf bedrijven hebben vijf groepen een opdracht gegeven om mee aan de slag te gaan. Onze groep gaat van start met de aanwezigheid van supporters tijdens de thuis wedstrijden van Ajax in de Johan Cruijf Arena.

My first #SWDChallenge part 6; Feedback from Cole Knaflic @storywithdata

I got homework for May 20, when I’m done with the Techionista training

My first #SWDChallenge part 4; Progress

  • First step: Make a table of the worksheet. Now it has all additional rows that do not contain the right information
  • Load the excel into PowerBI and do some transformations
    • Fill blanks in the “week nr” column
    • Replace the 1 null value in column “onderdeel”
    • In the column “begintijd” and “eindtijd” needs only to be time and no date
    • Recalculate the column “totale tijd”
    • The column “in uren” rounded to 2 decimals
  • Oh the column “totale tijd” is not giving the right values, I want to see the amount of minutes in there, back to Edit Queries, to find out why. Ah Data Type was Any, changed that to Duration en minutes
  • Yes my Data looks good
  • Add the excel file with the advised hours from Microsoft and Techionista
    • Do some transformations
  • Check the relation between the two tables
  • First visualization; a bar chart showing my hours, compared with Microsoft and Techionista

Conclusion hours by subject

The red bars show the hours that where planned by Techionista, black is what Microsoft advised, for 3 topics Techionista scheduled lesser hours than Microsoft advised. Green is what I actually have spend. For 3 subjects and the capstone I have spend more hours than Microsoft had advised.

The second visualization I want to make is a line chart, showing the time spend per day from the beginning to the end.

  • First I added a new table with dates, so I could use the date as the x-axis
  • Than I made a chart, where I visualised the hours per day and than the topic as legend. Only to realise that a line chart is not clear. So I changed it into a bar chart, which gave me the idea I wanted to see.
  • I also made a table to show the description by subject

Conclusion hours by day and subject

The further I got into the study, the more hours I have spend during the day.

Webinar MakeOverMonday

Vandaag gekeken naar een Webinar van de visualisaties die andere mensen gemaakt hadden met dezelfde data als die ik gebruikt had. Een aantal mooie dingen gezien, weer wat bijgeleerd.

My first #SWDChallenge part 2

The new challenge is here: Challenge May 2019
This month’s #SWDchallenge comes from guest author Mike Cisneros.

The challenge

Go out and collect a dataset of your own, analyze it and create a graph visualizing your findings. (Remember: sometimes the smallest, most specific stories can tell the most universal truths.)

My reaction

Oh I wanted to join because I do not have a dataset of myself, okay will join next month


Data Science Track van Microsoft afgerond

Yes vandaag de certificering ontvangen als bewijs dat ik alle 11 onderdelen succesvol heb afgrond.


Begonnen op 1 februari 2019, afgerond op 30 april 2019, hieronder een overzicht van het aantal uren dat ik er in gestoken heb. De reistijden zijn niet verwerkt. De training vond 3 dagen per week plaats in de Amsterdam ArenA en de rest was thuis studie. De reistijd naar de Amsterdam ArenA is 3,5 uur per dag.

Het onderstaande rapport is een interactief rapport in PowerBI

De onderdelen die ik nu in theorie beheers zijn: Introduction to Data Science, Power BI, Analytics Storytelling for Impact, Ethics and Law in Data and Analytics, Querying Data with Transact-SQL, Introduction to Python for Data Science, Essential Math for Machine Learning: Python Edition, Data Science Research Methods: Python Edition, Principles of Machine Learning: Python Edition, Developing Big Data Solutions with Azure Machine Learning, Microsoft Professional Capstone : Data Science

My first time at Makeover Monday

This was the given visualisation

There have been 216 spacewalks at the International Space Station since December 1998.

I downloaded the data (xls-file with 2 worksheets), saved this as two csv’s and pulled it into PowerBI.
There I did some transformation

  • In the table ISS Spacewalks I made sure that the “year” was a Whole Number and of the type Date and Year.
  • Also the “Number op Spacewalks” needed to be a Whole Number.
  • In the table Spacewalks I extracted the Year out of the “Date” column.
  • The ‘Duration (hours)’ is having two values with a -, this needed to be removed before I could transform the column into a Number.
  • I made a relation between the two tables using the year as a many to many relation.

After that my first challenge was to rebuild the given visualisation.

I’m proud of myself that I managed to create this.
I see that more can be done with this given dataset, but not for today. Maybe I will give it an other shot in the upcoming day’s.

Unit 2: Why Star Trek? > Star Trek and the Business of Cable Television Homework

In Unit 2 “Why Star Trek”, Module 2 “Star Trek and the Business of Cable Television” our assignment is an media analysis. We are asked to whatch the pilot episode of each of the 6 live action Star Trek television serie and how the storyline of each pilot episode related to the world when it was created.
Luckily we have Netflix were we can watch it.
This homework will take some time.

Here is the list of the pilot episodes we have been asked to think about for the Media Analysis. (Both pilots for Star Trek: The Original Series are included.)

  • 1966 “The Cage.” Star Trek: The Original Series
  • 1966 “Where No Man Has Gone Before.” Star Trek: The Original Series
  • 1987 “Encounter at Far Point.” Star Trek: The Next Generation
  • 1993 “Emissary.” Star Trek: Deep Space 9
  • 1995 “Caretaker.” Star Trek: Voyager
  • 2001 “Broken Bow.” Star Trek: Enterprise
  • 2017 “The Vulcan Hello.” Star Trek: Discovery

Question: Which pilot best addresses the contemporary societal issues from when it was produced while taking the most advantage of the television format on which it was shown? Rank the episodes you watch in numerical order where 1 is the episode that best answers the question prompt.

Unit 2: Why Star Trek? > Star Trek and The Business of Network Television

Star Trek and The Business of Network Television

Question:
To what extent did the business model of network television enable Star Trek: The Original Series to appeal to such a wide range of audiences? In ways did that same model constrain it?

Star Trek: The Original Series had a big appeal to people, because all type of people could
recognize them self in the characters playing in the serie.

– Kirk was daring, bold, emotional, and heroic
– Spock brought rational thought, a level head, and logical thinking to the table
– Bones was practical and compassionate.

Star Trek was being watched by viewers from young to old

Thanks to Sean Patrick Guthrie’s, blogpost juli 20, 2016 and Rabobank for the design of the business model, that I have used to fill

Course Star Trek: Inspiring Culture and Technology via edX

Just a break from my Data Science education, a course for fun nevertheless it has a connection with Science.

About this course

Intro voor de training

Why has Star Trek, which began as a failed network series, become so influential? Instead of fading away, the Star Trek universe now encompasses feature films, additional television series, and a universe of fan conventions and memorabilia. 

What about the shows and movies resonate with so many people? The powerful vision of futuristic space exploration drew on real history and contemporary issues to enhance its storytelling. Star Trek inspired audiences to ask fundamental questions about who they are and how they relate to the world around them.

When you enroll in this course, you will examine how Star Trek’s live action television shows and motion pictures affected audiences around the world. With your hosts, Margaret Weitekamp and Scott Mantz, you will discover the connections between Star Trek and history, culture, technology and society. You will hear from experts, watch clips from the shows and films, debate with fellow fans. and explore your own perspectives on and understanding of Star Trek’s lasting impact.

Through critical analysis and object exploration, you will examine how Star Trek tackled controversial topics, such as race, gender, sexuality, and ethics. Then, the mission is yours. Join the community to engage in civil discourse. Use evidence to understand how Star Trek shaped and still influences our technology and society. 

This course is offered under license by CBS Consumer Products.

What you’ll learn

In this course, students will:

  • Learn why we should study Star Trek as a lens for media scholars to analyze the history of television, the impact of science fiction on technology, and the phenomenon of fandom
  • Explore how Star Trek depicted a future where humans were explorers of the universe – serving as an inspiration to individuals and government agencies deeply involved in the race to get human beings into space for the first time
  • Understand how Star Trek‘s diverse crew prompted audiences to reconsider their own perceptions of different races and genders
  • Reflect on how Star Trek depicted various characters working to understand themselves and their place in the universe
  • Recognize how Star Trek inspires reflection on our own humanity and our place in the universe

https://www.edx.org/course/star-trek-inspiring-culture-and-technology

Studie update

Vandaag Module 4 in de Python training afgerond.

Ik heb nu de theorie doorgelezen van:

  • Lists, subsetting and manipulating lists
  • Functions, Methods & Packages
  • Numpy, 2D Numpy arrays & Basic Statistics with Numpy

Tijd om pauze te houden en het in te laten zakken, morgen weer verder.

This looks interesting and challenging

Today I was reading a post on LinkedIn and thought, this looks interesting and challenging. It was a post about storytelling with data and they challenge people to do a practice making a visualisation in the first week of the month and to share it. I’m thinking of taking up this challenge this month, see how far I can get, with the knowledge I have right now and my available time.

Start of the track becomming a Data Scientist

In this blog I want to keep track on how I started my first visibel steps in the world of Data Science.
In the past I already had undertaken actions that all have made me ready for this big step. Starting the training at Techionista: Becoming a Data Scientist with a certificate from Microsoft

Today January 28 2019 I started with the first track in the training: Introduction to Data Science