Data Assimilation

Data assimilation

Introduction

Data assimilation is a technique that combines observations with a numerical model to improve the accuracy of forecasts or analyses. Applied to acoustics and noise maps, data assimilation makes it possible to integrate real noise measurements (e.g coming from sensors) into noise propagation models to produce more accurate and reliable noise maps.

In this tutorial, we will see how to model and simulate noise in Geneva 🇨🇭 using static (road network, buildings) and dynamic (traffic, temperature) data. The main objective is to combine measurements from sensors located in Geneva with acoustic simulations to identify traffic configurations that best match measurements.

Requirements

To play with this tutorial, you will need:

  • a working installation of NoiseModelling (NM) with at least version 5.0.1. If needed, get the last release on the official GitHub repository,

  • the tutorial’s datasets, stored in the folder .../NoiseModelling_5.0.1/data_dir/data/wpsdata/dataAssimilation/,

  • the dedicated WPS .groovy scripts, stored in the folder .../NoiseModelling_5.0.1/data_dir/scripts/wps/DataAssimilation/.

The data

In the the tutorial’s folder, you have the following data:

  1. sensor measurement .csv files, stored in the devices_data/ folder (One file per sensor),

  2. device_mapping_sf.geojson containing two columns : the point geometry (THE_GEOM) and a sensor’s unique identifier (DEVEUI`). This file is also provided in .csv format,

  3. the OpenStreetMap (OSM) dataset of the studied area : geneva.osm.pbf.

Sensor measurements

Each .csv files in the folder devices_data/ contains environmental noise measurements recorded by individual sensors. The columns are:

  • deveui : Unique identifier of the sensor,

  • epoch : Time of measurement (Epoch format - Unix time, ex:1724567400),

  • Leq : Equivalent continuous sound level in dB(A), calculated over a period (15 min),

  • Temp : Temperature (°C) recorded by the sensor at the time of measurement,

  • timestamp : Time of measurement (timestamp format "YYYY-MM-DD HH:MM:SS").

Below is an illustration with the file 4a6.csv. Here we can see that 5 measurements were taken, every 15 minutes, between 6:30 and 7:30 am. During this period, noise levels rose slightly, while temperatures remained stable overall (except for the first measurement).

Informations stored in the sensor’s file 4a6.csv

deveui

epoch

Leq

Temp

timestamp

4a6

1724567400

40.6

10.00

2024-08-25 06:30:00

4a6

1724568300

41.9

18.125

2024-08-25 06:45:00

4a6

1724569200

43.9

18.125

2024-08-25 07:00:00

4a6

1724570100

42.3

18.125

2024-08-25 07:15:00

4a6

1724571000

45.5

18.125

2024-08-25 07:30:00

Warning

In this tutorial all the values, coming from the 7 sensors, are sampled approximately every 15 minutes, but the exact spacing may vary slightly.

We could also have had data with a time step of 5 minutes, 10 minutes, 1 hour, etc.

In all cases, it is important that all the measurements from the sensors are set to the same time step AND that they are phased (i.e. set to the same time intervals).

Sensor localization

The device_mapping_sf.geojson columns are:

  • deveui : Unique Identifier of the sensor,

  • the_geom : 3D point geometry in WKT (Well-Known Text) format — includes coordinates (X, Y) and altitude (Z) in the projected coordinate system.

Below is a map, showing the seven sensors (red points), with their identifier deveui.

Sensors in Geneva

How to compute the assimilation?

To compute the data assimilation, you will have to execute several WPS .groovy scripts. You can play with them in two ways:

  • with the NoiseModelling’s GUI (Graphic User Interface). In this case, the WPS blocks are listed in the Data_Assimilation tab (see screenshot below),

  • in command line (see how to Pilot NoiseModelling with scripts). In this case, just note that they are stored in the folder .../NoiseModelling_5.0.1/data_dir/scripts/wps/DataAssimilation/,

  • in a .groovy script, calling one or various WPS .groovy scripts.

List of WPS blocks

Data Simulation

This process prepares the training dataset from sensor measurements, importing it into NoiseModelling (in a spatial database) and performing various calculations to determine noise levels.

Maps are generated with all possible combinations, in order to identify the best road configuration in terms of noise levels generated.

Step 1 : Generate all possible combinations

Generates all possible combinations of values from two given lists and inserts them into a table named ALL_CONFIGURATIONS.

To calculate the combinations, you have to execute the WPS script All_Possible_Configuration.

Two input parameters are needed:

  • Traffic values (trafficValues): variation around standard values for the 4 types of roads: primary, secondary, tertiary and others. When generating a variation of the default map (the set of maps), the values taken by the primary, the secondary, … sections, will be calculated from these parameters (between 1 and n). For example, if "trafficValues": "0.01, 1", then the primary sections will take 1% or 100% of their default value. The same applies to the remaining road types.

  • Temperature values (temperatureValues) : the temperatures (in °C - Double). The 1 to n possible temperatures to play (in the example below, 3 temperatures are defined: 10, 15 and 20°C)

Execute All_Possible_Configuration WPS block

With the NoiseModelling GUI

All possible configuration

With command lines

1cd /home/.../NoiseModelling_5.0.1/
2
3./bin/wps_scripts -w ./ -s data_dir/scripts/wps/Data_Assimilation/All_Possible_Configuration.groovy -trafficValues "0.01,1,2" -temperatureValues "10,15,20"

With Groovy script

new All_Possible_Configuration().exec(connection,[
               "trafficValues": "0.01,1.0,2.0",
               "temperatureValues": "10,15,20"
])

Result

The generated combinations include values for type of roads primary, secondary, tertiary, others, and temperature. The resulting table ALL_CONFIGURATIONS has the following columns:

  • IT: Unique identifier of the combination (Primary Key - Integer)

  • PRIMARY_VAL: percentage of primary roads traffic, given by trafficValues (Double)

  • SECONDARY_VAL: … of secondary roads … (Double)

  • TERTIARY_VAL: … of tertiary roads …(Double)

  • OTHERS_VAL: … of other roads …(Double)

  • TEMP_VAL: temperature (Double)

The first 10 lines of this table are shown below:

IT

PRIMARY_VAL

SECONDARY_VAL

TERTIARY_VAL

OTHERS_VAL

TEMP_VAL

1

0.01

0.01

0.01

0.01

10

2

0.01

0.01

0.01

0.01

15

3

0.01

0.01

0.01

0.01

20

4

1.0

0.01

0.01

0.01

10

5

1.0

0.01

0.01

0.01

15

6

1.0

0.01

0.01

0.01

20

7

1.0

1.0

0.01

0.01

10

8

1.0

1.0

0.01

0.01

15

9

1.0

1.0

0.01

0.01

20

10

1.0

1.0

1.0

0.01

10

Warning

The total number of combinations can be huge. This value is defined as: (number of trafficValues elements) ^ 4 * (number of temperatureValues elements).

In our example, we have "trafficValues": "0.01, 1.0, 2.0" and "temperatureValues": "10,15,20", so the number of combinations = 3 ^ 4 * 3 = 243 (before filtration - see ‘note‘ tab just below)

Even though this table is very important, only part of it will be used for all the maps to be simulated (see Step 5)

Note

When the combinations are calculated, a filter is applied to remove inconsistent pairs (e.g. an ‘other’ type road with much more traffic than a ‘primary‘ road).

As a result, only pairs meeting the following rule are retained: traffic on a lower type of road / traffic on a higher type of road <= 20

This rule eliminates cases where a road of a lower type would have 20 times more traffic than another of a higher type.

Step 2 : Import sensor positions

Using the Import_File WPS script*, import the location of the sensors into the NoiseModelling’s database from the .geojson file device_mapping_sf.geojson. This file will be stored in a table called SENSORS_LOCATION.

* in the data_dir/scripts/wps/Ìmport_and_Export/ folder

🌍 Since we are in the Geneva area, we are using the CH1903+ metric coordinate system (identified as EPSG:2056).

  • pathFile : .../device_mapping_sf.geojson

  • inputSRID : 2056

  • tableName : SENSORS_LOCATION

If you are using the Groovy script

new Import_File().exec(connection,[
                "pathFile" : workingFolder+"device_mapping_sf.geojson",
                "inputSRID" : 2056,
                "tableName": "SENSORS_LOCATION"
])

Result

Once done, you have the table SENSORS_LOCATION, presented below.

THE_GEOM

DEVEUI

POINT Z(2499748.01396418 1118023.64763599 4)

57a

POINT Z(2497926.50847343 1119715.69600241 4)

649

POINT Z(2499855.07784477 1118032.40228679 4)

582

POINT Z(2497737.00785445 1119351.14658559 4)

51b

POINT Z(2499392.11210506 1118441.15233318 4)

4a6

POINT Z(2498584.3443027 1119522.320612 4)

650

POINT Z(2498084.77671566 1119172.90847018 4)

5c5

Step 3 : Prepare sensor data

Now we can extract and prepare the sensors, for a given period. To do so, we are using the Prepare_Sensors WPS .groovy script stored in the folder .../data_dir/scripts/wps/DataAssimilation/.

This script has the following parameters:

  • startDate : the start timestamp to extract the dataset (in format YYYY-MM-DD HH:MM:SS)

  • endDate : the start timestamp to extract the dataset (in format YYYY-MM-DD HH:MM:SS)

  • trainingRatio : define the percentage of data to be used for training (e.g a value of 0.7 means that 70% of the data will be used for training. The remaining 30% will be used for validation) (Double)

  • workingFolder : folder containing the file device_mapping_sf.csv, the OSM file and the devices_data folder.

  • targetSRID: Target projection identifier (also called SRID) of your project (Integer)

Execution

For this tutorial, you can fill with these informations:

  • startDate : 2024-08-25 06:30:00

  • endDate : 2024-08-25 07:30:00

  • trainingRatio : 0.8

  • workingFolder : .../data_dir/data/wpsdata/dataAssimilation/ (enter the full URL e.g /home/myUserName/Documents/NoiseModelling_5.0.0/NoiseModelling_5.0.0/data_dir/data/wpsdata/dataAssimilation/)

  • targetSRID : 2056

If you are using the Groovy script

new Prepare_Sensors().exec(connection,[
                "startDate":"2024-08-25 06:30:00",
                "endDate": "2024-08-25 07:30:00",
                "trainingRatio": 0.8,
                "workingFolder": ".../data_dir/data/wpsdata/dataAssimilation/",
                "targetSRID": 2056
])

Result

Once executed, two tables are created:

  • SENSORS_MEASUREMENTS : representing all the data for this period, with sensor’s position (POINT geometry)

  • SENSORS_MEASUREMENTS_TRAINING : filtering SENSORS_MEASUREMENTS to keep only the training dataset. This table (described below) will be used as the RECEIVER table for the following steps.

    • IDSENSOR : Unique identifier of the sensor,

    • THE_GEOM : The sensor’s position (POINT geometry),

    • IDRECEIVER : The receiver’s unique Id

    • EPOCH : Time of measurement (Epoch format - Unix time, ex:1724567400),

    • LAEQ : Equivalent continuous sound level in dB(A), calculated over a period (15 min),

    • TEMP : Temperature (°C) recorded by the sensor at the time of measurement.

Step 4: Import buildings and roads

Now, using the Ìmport_OSM block, you can import buildings and road network (with predicted traffic flows) from the geneva.osm.pbf OSM file.

Execution

  • Path of the OSM file (pathFile): .../data_dir/data/wpsdata/dataAssimilation/geneva.osm.pbf

  • Target projection identifier (targetSRID): 2056

  • Do not import Surface acoustic absorption (ignoreGround): true

  • Remove tunnels from OSM data (removeTunnels): true

The other parameters are left as default

If you are using the Groovy script

new Import_OSM().exec(connection, [
                "pathFile"      : ".../data_dir/data/wpsdata/dataAssimilation/geneva.osm.pbf",
                "targetSRID"    : 2056,
                "ignoreGround"  : true,
                "removeTunnels" : true
])

Result

The tables BUILDINGS and ROADS are created.

Note

If you are using NoiseModelling with the GUI and wish to visualize the data in a map, you can use the Table_Visualization_Map block. Below is the result with the two table BUILDINGS (left) and ROADS (right).

BUILDINGS and ROADS

Step 5 : Generate all traffic emissions and maps

This step consists in generating all the traffic emissions by modifying traffic data according to the road type, using data from ALL_CONFIGURATIONS (see Step 1).

1. Generate emissisons

To do so, users have first to execute the Data_Simulation WPS block, which has only one optionnal parameter :

  • noiseMapLimit : final number of maps to be generated (Integer). If a value is filled, a random selection is applied to keep the number of expected maps, based on the LHS (Latin Hypercube Sampling) method.

Execution

For this tutorial, you can fill with this information:

  • noiseMapLimit : 80

If you are using the Groovy script

new DataSimulation().exec(connection,[
                "noiseMapLimit": 80
])

Result

Two tables are created:

  • LW_ROADS : table containing all traffic emissions

  • ROADS_GEOM : table containing the geometry of roads

2. Calculate noise levels

Now, users can calculate the noise levels, emitted from the road sources. To do so, execute the Noise_level_from_source WPS script.

Execution

For this tutorial, you can fill with this information:

  • Source geometry table name (tableSources): ROADS_GEOM

  • Source emission table name (tableSourcesEmission): LW_ROADS

  • Buildings table name (tableBuilding): BUILDINGS

  • Receivers table name (tableReceivers): SENSORS_LOCATION,

  • Separate receiver level by source id (confExportSourceId): false

  • Maximum source-receiver distance (confMaxSrcDist): 250

  • Diffraction on vertical edges (confDiffVertical): false

  • Diffraction on horizontal edges (confDiffHorizontal): false

If you are using the Groovy script

new Noise_level_from_source().exec(connection, [
                "tableSources": "ROADS_GEOM",
                "tableSourcesEmission" : "LW_ROADS",
                "tableBuilding": "BUILDINGS",
                "tableReceivers": "SENSORS_LOCATION",
                "confExportSourceId": false,
                "confMaxSrcDist": 250,
                "confDiffVertical": false,
                "confDiffHorizontal": false
])

Result

The table RECEIVERS_LEVEL is created and stores all the generated maps (see the three first lines below - you can scroll the table to the right).

IDRECEIVER

PERIOD

THE_GEOM

HZ63

HZ8000

LAEQ

LEQ

1433

344

SRID=2056;POINT Z (2496912.2844730876 1116884.1525357629 4)

44.601295

14.591279

38.107693

46.40716

1433

448

SRID=2056;POINT Z (2496912.2844730876 1116884.1525357629 4)

46.617485

15.979266

40.203373

48.37574

1433

648

SRID=2056;POINT Z (2496912.2844730876 1116884.1525357629 4)

46.617485

15.979266

40.203373

48.37574

Step 6 : Extract best configuration

Many maps have been generated. So now, the best map, minimizing the difference between the measurements and the simulation, must be chosen.

To do so, users have to execute the Extract_Best_Configuration WPS script. There are 3 parameters to enter here:

  • Measurement table (observationTable): name of the table where observed data are stored,

  • Noise map table (noiseMapTable): name of the table where simulated data are stored,

  • Temperature tolerance threshold (tempToleranceThreshold): tolerance threshold (exprimed in °C) for the temperature that allows to extract the map that have a temperature value close to the real temperature.

This process will:

  • first determine the LAEQ difference between simulated (noiseMapTable) and observed (observationTable) values,

  • then calculate, for each time steps and for all maps, the median value of the difference between the two values. The map having the smallest median value will be the best one.

Execution

For this tutorial, you can fill with this information:

  • observationTable : SENSORS_MEASUREMENTS_TRAINING

  • noiseMapTable : RECEIVERS_LEVEL

  • tempToleranceThreshold : 5

If you are using the Groovy script

new Extract_Best_Configuration().exec(connection,[
                "observationTable": "SENSORS_MEASUREMENTS_TRAINING",
                "noiseMapTable": "RECEIVERS_LEVEL",
                "tempToleranceThreshold"  : 5
])

Result

Note

The best configuration is found for each time step (here 15 minutes)

As a result, the BEST_CONFIGURATION_FULL table is created (see table below). This table is made of the following columns:

  • EPOCH : Time of measurement (Epoch format - Unix time, ex:1724567400),

  • MIN_MEDIAN_DIFF : the minimum median of the difference between simulated and measured LAEQs (this is how the best maps are chosen)

  • IT : Unique identifier of the combination (Primary Key - Integer)

  • PRIMARY_VAL : percentage of primary roads traffic, given by trafficValues (Double)

  • SECONDARY_VAL : … secondary roads …

  • TERTIARY_VAL : … tertiary roads …

  • OTHERS_VAL : … other roads …

  • TEMP_VAL : temperature (Interger)

Table BEST_CONFIGURATION_FULL showing one best configuration for each of the time steps

EPOCH

MIN_MEDIAN_DIFF

IT

PRIMARY_VAL

SECONDARY_VAL

TERTIARY_VAL

OTHERS_VAL

TEMP_VAL

1724567400

9.3827

366

2.8

2.8

0.01

0.2

10

1724569200

11.2629

577

3.0

0.2

3.0

0.2

15

1724570100

11.2

577

3.0

0.2

3.0

0.2

15

1724571000

11.625

577

3.0

0.2

3.0

0.2

15

1724568300

11.1953

679

3.0

3.0

0.2

0.2

25

1724569201

7.8047

679

3.0

3.0

0.2

0.2

25

1724569202

10.3047

679

3.0

3.0

0.2

0.2

25

Execute Simulation: Generate the Dynamic Map

This part is made to execute a dynamic traffic calibration process, using the best configuration.

Step 7 : Generate new receivers

Create a regular grid of receivers between the buildings.

To do so, use the Regular_Grid WPS script (in the /Receivers/ wps scripts folder).

Execution

For this tutorial, you can fill with this information:

  • Buildings table name (buildingTableName): BUILDINGS

  • Table bounding box name (fenceTableName): BUILDINGS

  • Source table name (sourcesTableName): ROADS

  • Offset (delta): 200 (1 receiver every 200m)

If you are using the Groovy script

new Regular_Grid().exec(connection,[
              "buildingTableName": "BUILDINGS",
              "fenceTableName": "BUILDINGS",
              "sourcesTableName":"ROADS",
              "delta": 200
])

Result

The table RECEIVERS is created.

Three first lines of the table RECEIVERS

THE_GEOM

ID_COL

ID_ROW

PK

POINT Z(2496682.2844730876 1116854.1525357629 4)

1

1

1

POINT Z(2496882.2844730876 1116854.1525357629 4)

2

1

2

POINT Z(2497482.2844730876 1116854.1525357629 4)

5

1

5

RECEIVERS map

Step 8 : Adding sensors as receivers

Optionally, add the sensors location into the RECEIVERS table (just created before).

Execution

In the Merged_Sensors_Receivers WPS block, fill these information:

  • The receiver table (tableReceivers): RECEIVERS

  • The sensors table (tableSensors): SENSORS_LOCATION

If you are using the Groovy script

new Merged_Sensors_Receivers().exec(connection,[
                "tableReceivers": "RECEIVERS",
                "tableSensors" : "SENSORS_LOCATION"
])

Result

The RECEIVERS table has now new points, the sensor’s one

RECEIVERS and SENSORS map

Step 9 : Generate dynamic road emissions

For each time step (here 15 min), generate an emissions map for all the receivers, corresponding to the best configuration (for this time step).

To do so, use the NMs_4_BestConfigs WPS block. This block has two inputs:

  • The best configuration table (bestConfig)

  • The road emission table (roadEmission)

Execution

  • bestConfig : BEST_CONFIGURATION_FULL

  • roadEmission : LW_ROADS

If you are using the Groovy script

new NMs_4_BestConfigs().exec(connection)
                "bestConfig": "BEST_CONFIGURATION_FULL",
                "roadEmission" : "LW_ROADS"

Result

The table LW_ROADS_best is created, with the following structure:

  • PK : Unique Identifier

  • IDSOURCE : Source id

  • PERIOD : time

  • HZ63 : noise level for the 63 HZ frequency band

  • HZ125 : … for 125 HZ …

  • HZ250 : … for 250 HZ …

  • HZ500 : … for 500 HZ …

  • HZ1000 : … for 1000 HZ …

  • HZ2000 : … for 2000 HZ …

  • HZ4000 : … for 4000 HZ …

  • HZ8000 : … for 8000 HZ …

Step 10 : Generate the noise levels

Using the Noise_level_from_source WPS block, we can finally compute the noise level from the network sources emission (LW_ROADS_best) based on all the receivers.

Execution

  • Source geometry table name (tableSources): ROADS_GEOM

  • Source emission table name (tableSourcesEmission): LW_ROADS_best

  • Buildings table name (tableBuilding): BUILDINGS

  • Receivers table name (tableReceivers): RECEIVERS

  • Separate receiver level by source id (confExportSourceId): false

  • Maximum source-receiver distance (confMaxSrcDist): 250

  • Diffraction on vertical edges (confDiffVertical): false

  • Diffraction on horizontal edges (confDiffHorizontal): false

If you are using the Groovy script

new Noise_level_from_source().exec(connection, [
                      "tableSources": "ROADS_GEOM",
                      "tableSourcesEmission" : "LW_ROADS_best",
                      "tableBuilding": "BUILDINGS",
                      "tableReceivers": "RECEIVERS",
                      "confExportSourceId": false,
                      "confMaxSrcDist": 250,
                      "confDiffVertical": false,
                      "confDiffHorizontal": false
])

Result

We obtain the table RECEIVERS_LEVEL.

Step 11 : Create & visualize the resulting table

Create a table, called ASSIMILATED_MAPS, containing both sound levels and configuration parameters.

To do so, execute the Create_Assimilated_Maps WPS block, with the 3 following parameters:

  • bestConfigTable : the best configuration table name

  • receiverLevel : the receivers level table name

  • outputTable : the output table name

Execution

For this tutorial, you can fill with this information:

  • bestConfigTable : BEST_CONFIGURATION_FULL

  • receiverLevel : RECEIVERS_LEVEL

  • outputTable : ASSIMILATED_MAPS

new Create_Assimilated_Maps().exec(connection,[
                "bestConfigTable" : "BEST_CONFIGURATION_FULL",
                "receiverLevel" : "RECEIVERS_LEVEL",
                "outputTable": "ASSIMILATED_MAPS"
])

Result

The resulting ASSIMILATED_MAPS table has the following columns:

  • TIMESTAMP : time of the measure

  • LAEQ : noise level (dB(A))

  • THE_GEOM : receiver’s geometry (POINT)

  • IDRECEIVER : receiver’s unique identifier

Three first lines of the table RECEIVERS

TIMESTAMP

LAEQ

THE_GEOM

IDRECEIVER

1724567400

38.16436

SRID=2056;POINT Z (2496912.2844730876 1116884.1525357629 4)

1433

1724567400

39.29134

SRID=2056;POINT Z (2496912.2844730876 1116864.1525357629 4)

1197

1724567400

42.9573

SRID=2056;POINT Z (2496912.2844730876 1116844.1525357629 4)

961

Visualize the map

You can now export the ASSIMILATED_MAPS table, for example as a Shapefile, using the Export_Table WPS block and then import it into your favorite GIS app (such as QGIS) to visualize the results.

  • exportPath : results/ASSIMILATED_MAPS.shp

  • tableToExport: ASSIMILATED_MAPS

or

new Export_Table().exec(connection,
                ["exportPath": "results/ASSIMILATED_MAPS.shp",
                 "tableToExport": "ASSIMILATED_MAPS"
])

The resulting map may looks like this:

ASSIMILATED_MAPS map

Legend: Here, we applied the color scheme proposed by Beate Tomio (see Noise Map Color Scheme) on the column LAEQ. We also added the input sensors, the buildings (light grey) and the roads (white lines).

Remark: In this example, we have chosen to have a receiver every 200m (see Step 7). To achieve a denser rendering, we could have made the receiver grid denser. Below is an example, on the same area, but with a 20m grid.

ASSIMILATED_MAPS map