Predict the number of cyclists

In the city of Västerås, Sweden, the number of pedestrians and cyclists are counted in different places around the city.

A dashboard showing the counts is available online: https://data.eco-counter.com/ParcPublic//?id=4615

By looking at the calls the webpage makes, we can get the data source for the different counters, for example, the one counting cyclists on the street “Hamngatan”:
https://www.eco-visio.net/api/aladdin/1.0.0/pbl/publicwebpageplus/data/100030444?idOrganisme=4615&idPdc=100030444&fin=31%2F12%2F2021&debut=18%2F12%2F2014&interval=4&flowIds=102030444

Parse this data into a table with 2 columns: date, number.
The file is in JSON format and is a list of lists. It can be parsed in e.g. Notepad++ (search-replace or JSON plugin), Excel data import, scripting (e.g. Python), or websites (search JSON to Excel or CSV, for example, http://convertcsv.com/json-to-csv.htm works and can even take the URL as input).

Once you have organized the data you can move on to perform some initial analysis.
How many observations? What time frame? What are the maximum and minimum values? What is the average number?
Are there any gaps in the observations?
Are there any extreme values that might be errors? Should they be removed? What can happen if you do not remove extreme values?

Perform Predicitive Analysis

Can you predict the number of cyclists that will pass tomorrow? in a weeks time? a month?

  • Approach with time series analysis
  • Approach with regression analysis
  • Approach with tree analysis
  • Approach with an advanced machine learning method

Presentation

Present your best estimate of the number of cyclists in a week from now. (Check the results after a week!)

How reliable is this prediction?

Which concepts and variables are your prediction model based on?

Did you compute or add any variables? (e.g. month, weekday, weather, moon cycle, school holidays etc)

Who could benefit from this analysis and these results?

Extra Challenge

We are using an unofficial API to access the data. As such we can only guess how it works. The variable names give some clues, for example, the dates are “debut” (French for start) and “fin” (end).

It seems that by altering the URL you can change the data resolution. Try changing “interval=4” to “interval=1” or =5 or =6
What do you get? How does this change the analysis?

Leave a Reply

Your email address will not be published. Required fields are marked *

Please reload

Please Wait