What data do we already have?
Which existing data can we use, and how does it relate to the measurement systems?
Together with a data scientist, the city has examined the available data. We look at both the output from the living labs for measurement systems and the data we already have.
Various data sources
We have access to a range of data from which we aim to derive valuable insights, including:
- Current and historical crowd measurements.
- Mobility data: street parking.
- Mobility data: parking garages.
- Traffic circulation data from traffic loops and ANPR cameras.
- Other mobility data, such as car-sharing, bike-sharing, bike counters, park-and-rides, Waze, etc.
- Tourism data: hotel occupancy and reservations.
- Visitor numbers for cultural and tourism sites.
- Visitor numbers for events.
The applicability of this data varies significantly:
- (Nearly) continuous data vs. time-limited datasets.
- Data available via API or not.
- Real-time data or post-factum data.
The analysis by VLOED indicates that data is most useful for tracking crowd levels when it involves long time series, available via API, and includes real-time data. Since many data sources are related to crowd levels, this is a key consideration for most systems.
Proxy data for crowds
We identified datasets within Ghent that could serve as strong proxies for city crowd levels. This has been confirmed in further processing for predictive models. It is therefore crucial to focus on the availability and quality of this type of data. Proxy data becomes particularly valuable when combined with data indicating the distribution of crowds within different areas of the city.
Specifically, this includes:
- Occupancy data for parking garages.
- Circulation data based on ANPR cameras.
Data gaps
Every system faces challenges and issues that lead to gaps in the data. This becomes particularly apparent when exploring possibilities for predictive models. For example, a device might fail, or the systems for measuring, processing, or storing data might experience outages. A parking facility might be temporarily inaccessible, or its capacity might be reduced. In all these situations, data and predictive models must accommodate these issues. Therefore, data must be continuously managed and monitored critically.
The main challenge for data analysis and predictive models is handling missing data. Therefore, VLOED also focuses on data monitoring and data imputation.
The need for integrated data management
Multidisciplinary data, long real-time time series, data quality, data imputation, and proxy data all play a significant role in data analysis when actively pursued and managed. To build on the VLOED project, we need a well-developed data management plan. Implementing this plan will also impact the need for specific crowd measurement systems and allow us to work effectively with the data we need.