The complete guide to Coronavirus geotracking Apps and Time Series Databases Analysis

The Coronavirus causes heavy damange to different industry sectors. Therefore, one hope is to contain spread with Coronavirus Geotracking apps and cell analysis. One potential solution are Bluetooth driven apps which do not violate privacy or to do Coronavirus geotracking with third party data. Another solution is complete Coronavirus geotracking with Big Time Series Analysis as well as storing and doing analysis with Time Series Database backend.

In this article, we discuss how extended geotracking with smartphone apps works. Forthermore, we examplain how location fingerprinting to gain more data privacy can be realized on top of extended location data.

Coronavirus Geotracking VS Anonymous Bluetooth tracking

In a previous article, we discussed anonymous tracking via Bluetooth.

The huge disadvantage of Coronavirus geotracking apps over pure Bluetooth solutions is the lack of privacy.

Cicero in front of the court in rome reminds us that judgement of whatever Big Data solution and the privacy is not an easy task.
One needs a very educated lawyer to decide if the needs of the many and the advantages justify extended Coronavirus geotracking compared to a privacy compliant Bluetooth tracking app.

On the other hand, extended Coronavirus geotracking opens much more preceise matching and allows to understand infection chains also from a geographic perspective.

In addition, extended tracking can also use a fused approach. It is possible to go way beyond pure GPS ccordinates.

Tracking which WiFi networks that are received in a certain strength or which are connected leads to new insights. For example, when two smartphones are connected to the same WiFi there is a high likelyhood that more than one person were in the same room and when two different WiFis are connected at the same time those are likely different flats even if the GPS position is similar.

The same it is for Wifis which goes for near Bluetooth signals for devices (speakers, keyboards and so on); which, are found near a device to gain more inisghts where other smartphones were at the same time near the same devices. In addition, smartphones themselves can broadcast Bluetooth signals themselves that are then recived by other smartphones.

An additional data point is also the signal strength of cell towers. This can reveal insights when two people have very same readings of all cell towers around.

The additional data which is purely on geolocation makes it possible to to compute the distance and infection between two smartphones in a much higher precision.

All of these data offers new possibilities and a much larger analysis vector then pure Bluetooth or GPS that can lead to complete new insights when machine learning is applied.

Therefore countries, politicians and users have to decide if the benefits of advanced Coronavirus geotracking outweights the privacy concerns.

In the following, we describe how an advanced Coronavirus geotracking approach is implemented.

How Coronavirus Geotracking works

A smartphone reports received signals and GPS postion to a central government Time Series Database. Once an infection with the Coronavirus happens a matching via Time Series Anlaysis gets executed and potentially infected people get warned.
Smartphones receive plenty of radio signals and GPS data which can be used for location based . When they are loaded into a Time Series Database a matching between different smartphone data reveals potential infection chains.
  1. A smartphone app receives different location indicating data. These are:
    – Cell tower signals of the mobile provider
    – Connected WiFi and the signal strength of non connected WiFis
    – Bluetooth devices and Bluetooth Beacons near the smartphone
    – GPS position
    – A triangulated location from the operating system provider of the mobile phone
  2. All the priorly described location data is sent to a Coronavirus database or cached locally in the smartphone.
  3. Once a person is infected it reports the infection to the central database.
  4. The infected person movement history is matched with movements of other people to see which people were at similar corrdinates, logged into the same WiFi or recieving the same Bluetooth Beacons.
    The matching can be done in a central database where all users transmit their telemetry regularly or there is an interface (API) where healthy users can match their movement profile.
  5. From the matching, users receive a risk alert when they are in the same WiFi, recieving the same Bluetooth Beacons or having been at the same geocoordinates at the same time.

Coronavirus Geotracking app

Developing a background Coronavirus geotracking app which collects Cell-tower signals, WiFis, Bluetooth devices, GPS and locations form the operating system is challenging.

The main challenge is that the Coronavirus geotracking App resides in the background and receives regular location updates but does still not use too much energy. For instance, GPS tracking consumes battery heavily, wherelse cell signal scanning does not consume a lot of power.

iPhones have app-running-in-background restructions which we have already discussed when talking about the pure Bluetooth App. Therefore, we focus in the following on Android.

Low energy scanning

In order to save battery when doing location tracking, one needs to consider that scanning intervals and also scanning types (WiFi, GPS, Bluetooth) creates huge implications on battery life.

Therefore, one can use energy efficent and passive scans which do not eat into battery to decide which signals are captured and if GPS is turned on.

From experience, we propose the following states for decisions on battery heavy or light location scans.

  • Refinement
    Often, the location provider from the operating system first delivers a coarse grained location, and then a few seconds later, a more fine grained location which is within the proximity of the coarse grained location.
    When that happens, a GPS and WiFi update is not needed and it is enough to scan for Bluetooth Devices.
  • Tiny Movement
    The last location is moved in a tiny way (<40m) . The passive location provider from the OS shows only a tiny distance update of the cell tower readings and the WiFi signals do not differ much. Alternatively, a network to a Wifi is established that was before just readable but not connected.
    A cell update and updating WiFi signals and Bluetooth should be done.
  • Movement
    A WiFi Network is disconnected, the signal strength of cell towers differs a lot of the operating system indicates a larger movement.
    All signals inclusively of GPS should be updated.
    Since this regularly happens when moving in a vehicle the GPS update should be scheduled till the moving stops to avoid continous movement tracking. However, since other people can be in the moving vehicle, Bluetooth and Wifi scans should be done as they are more energy efficent.

Location Datatypes

Another key challenge is the strength of the signals. Thereby, it is not enough to just store the devices and networks which are seen, but also how long they were there to be able to match it with other people who were near the devices later.

Here is how one can store the Bluetooth devices as a list.

// when the device appeared for the first time
Date seenFrom;
// when the device appeared for the last time
Data seenTo;
// the maximum distance
float maxdistance;
// the Mac of the device
bluetoothAddress;
// Optional things like a url for Eddystone becons, Beacon Regions ...
Map<String,String> additions;

With simple rules then the similarity between two scans can be computed to derive the movement type. An easy way is listed below and actual implementation can use even more complex measures to compute the movement from the signals strength.

int similarity = 0;
for (BluetoothDevice btDevice: btDevices) {
	for (BluetoothDevice priorBtDevice : priorBtDevices)
             if(priorBtDevice.bluetoothAddress.equals(
                         btDevice.bluetoothAddress)) {
			       similarity=similarity++;
					continue;
				}
			}
		}

Similary, to Bluetooth devices, routers around a smartphone can be scanned and assembled into a list. Likewise, we show the properties of scanned WiFis and one can see it is similar to the Bluetooth devices and the same goes for cell information.

An example WiFi record

// when the device appeared for the first time
Date seenFrom;
// when the device appeared for the last time
Data seenTo;
// the hardware address of the router
String BSSID;
String ssid;
// the signal strength over the reading period
int signalDbMin;
int signalDbMax;
int signalDbAvg;
//If the smartphone was connected to the network
boolean isConnected;

All in all, this leads then to a detailed location information, containing Bluetooth, WiFi and Cell Tower information. This extended location can then be used to track location history and to compute the movement to prior locations.

List<CellInfo> cellInfo;
List<WifiInformation> wifiInformation;
List<BluetoothBeacon> bluetoothInformation;

GPSLocation operatingSystemtriangulatedLocation;
GPSLocation gpsLocation;

We see that the information is now much more extended then a pure GPS position which has the advantage that often GPS is outdated and battery intensive. This makes it possible to match potential infection contacts based on additional criteria.

Extended location Time Series Analysis

In the following, we see Time Series Data of four different smartphones.

An excavation site with different excavated levels is like extended time series location analysis.
Excavations and reconstructing past events are a very different type of extended location and Time Series analysis from another scientific area.

Smartphones recognize different signals and GPS positions. We imagine that one of the records could be of an infected person and we desire to find out possible matches and how high the likelhood of an infection is.

TimefromDurationSmartphone IDBluetooth devicesWiFisConnected WiFiCell Tower IDsBest GPS
(OS and Signal)
14:00151b3:c0:83:16:27:cd
fa:c9:27:cb:8e:40
fe:41:85:56:ee:78
84:56:04:a4:6b:58
5d:03:4c:ea:21:d9
5d:03:4c:ea:21:d911111
22222
49.2949,8.6468
14:031525d:03:4c:ea:21:d95d:03:4c:ea:21:d933333 49.2949,8.6468
14:05253 b3:c0:83:16:27:cd84:56:04:a4:6b:5822222 49.294,8.646
14:0734 84:56:04:a4:6b:58
5d:03:4c:ea:21:d9
33333 49.2949,8.6468

Time Series Analysis for matching infection risks

In the simplest analysis case these Time Series Data can now be loaded into a Time Series Database and then time series analyis queries can be done on top of it to find likely matches. In the following we show a few examples.

QueryBluetoothWifisConnected WifsCell TowersGPSSmartphone
Shared Minutes
Nearby Bluetooth and GPS distanceb3:c0:83:16:27:cd 110 m
precison
1,3 –
10 Minutes
Nearby WiFi84:56:04:a4:6b:58
5d:03:4c:ea:21:d9
1,4 –
3 Minutes
Connected WiFi 5d:03:4c:ea:21:d91,2 – 12 Minutes
Same GPS 11 m precison1, 2 – 12 Minutes
1, 4 – 3 Minutes
2, 4 – 3 Minutes

Same Bluetooth and GPS distance

We query for the same received Bluetooth devices and the GPS distance. We find two smartphones, 1 and 3, which received the same Bluetooth device for 10 minutes.

The GPS likelihood of distance is 110 meters, and that could altogether be an indicator for a possibility of an infection.

Nearby WiFi

WiFi signals can be an indicator if people were near another. Hence, we see a Time Series Database query answering the question which smartphones received the same WiFis at the same time.

The result shows that smartphone 1 and 4 shared this for 3 minutes. One can now do a longer analysis and see the signal strength or a longer time period to have a clear indication if the smartphones have been in the same room.

Connected WiFi

Out of the signal time series data we also query a Time Series Database which smartphones used the WiFi and how long.

This is especially important as WiFis normally belongs to one entity, such as a household or a restaurant, and it is therefore a good indicator of potential infection.

We see the Time Series Database query results results in a 12 minute shared time from smartphone 1 and 2.

Same GPS

Last but not least, GPS coordinates can be used to match devices which have similar GPS coordinates at the same time.

We see different matching results for the different smartphone pairs. 1 and 2 were likely within a radius of 11 meters for 12 minutes, wereby 1,2 and 4 only were 3 minutes near another.

Fused querying; Signal strength; Downsampling

All in all, the queries we show before are examples that can be combined. One can provide queries which specify devices which are being used at the same time in the same WiFi or being at the same geolocation.

Out of using these different criterias together, it provides a higher accuracy than pure geolocations for matching can achieve.

Furthermore, the signal strength, accuracy likelihood and concrete timings can be used in addition when a GPS or other signal was found. Signal strength, timings and accuracy values help to then introduce new matching criteria and to advance the matching results.

An important factor in getting more precision is also applied downsampling and aggregation what is naturally supported by Time Series Databases. Thereby, signal readings is condesed over a time interval to extract averages, peaks and most common signal values in that time.

The nature that an intense Corona infection likelyhood implies that there is likely a 15 minute contact within a distance of two meters. Hence, signal readings can be condensed via aggregation rules to time intervals lesser than 15 minutes (e.g. 5 minutes) where the most descriptive readings are used for matching afterwards. These most descriptive factors can be the average signal strength of received WiFis or the precision digit of the GPS coordinate which did not change in that time. Typically this downsampling and aggregation into time intervals

Downsampling, signal readings and querying for multiple factors can then be used together to increase the matching accuracy of potential infected people.

Value of extended Time Series Analysis

Extented location tracking based on signals aside GPS opens new precision possibilities to determine when people have access to the same Bluetooth device or when they are in the same WiFi. One even could go further and use the earth magnetic field for positioning or just compare magnetic fingerprint readings from smartphones.

All readings and signals together with GPS increase the likelyhood of better location matching when people have been near another.

By storing these time series data then in a Time Series Database an efficent way of aggreating and downsampling is available.

Extended location fingerprinting

The huge downside of extended Coronavirus geotracking is privacy where other approaches without a and data transmission offer more secrecy for the user.

Privacy in Coronavirus geotracking can be achieved with different means. Not all possibilities offer only advantages.
Privacy in Coronavirus geotracking can be achieved with different means. Not all possibilities offer only advantages.

With readings to which devices and WiFis a user is connected, even more information than the pure geolocations is available.

In addition to that, one can mine habbits even without artifical intelligence by just querying the Time Series Database wherein the data is stored. For instance one can find out on which weeks a user is connected to which WiFi and on which geopositions these WiFis are located.

Therefore, it is needed that the geolocation can be fuzziefied and the signal readings and received devices are not transmitted over the internet.

Geoposition fuzzification

One way to fuzzyfy the geoposition in Coronavirus geotracking is to remove decimals from the GPS coordinate till the desired fuzzyfication is reached.

The geolocation can also be fuzziefied and abstracted via a Mercator projection.

Originally, Mercator projection is used in maps that one can zoom and look at a map in different resolutions.

The advantage of a Meractor projection to abstract to geposition to a tile ID is that the two dimensional position is now just a one dimensional ID, which is often easier for further procession. However, also just removing decimals from a GPS position works, too.

Privacy and location fingerprinting

The fuzzification of the geolocation alone might not be enough anonymization.

One method to avoid sending signal and geolocations details to a server and a Time Series Database can be to compute a fingerprint of the situation which a smartphone reads and then computing a one way signature.

Different smartphones receive different signals. Thereby the time, WifiS, GPS and Bluetooth signals differ. The signals can be used to compute a fingerprint via local sensitive hashing. The hashes with a low distance share huge similarities and have a little distance to another.
Extended location fingerprinting: Smartphones receive radio and GPS signals and use them to compute time based location fingerprints.
The more time, GPS, WiFi and Bluetooth signals differ, the larger is the distance in the resulting fingerprints. These distances can then be compared to find potential Coronavirus infected people.

One can use normal hashing techniques, but the problem is that different smartphones might have fluctuating readings, which would then lead to a different fingerprint.

A better possibility is compute fingerprints via local sensitive hashing (LSH).

LSH is an algorithmic technique that hashes similar input items into the same “buckets” with high probability.

https://en.wikipedia.org/wiki/Locality-sensitive_hashing

Locally on a smartphone signals can be collected for a certain time interval. Once a time interval is complete the representative signal strength (e.g. average value or main quantile) can be combined together vith the fuzzified geoposition and other values via local sensitive hashing.

Another smartphone who is near this situation will likely come to a a similar fingerprint of of the similarities.

The fingerprints are then submitted to the server and the Time Series Database only will contain the following entries:

TimefromDurationFingerprint Smartphone
14:0015AEFGIJ-845D-UZ00-11220049861
14:0315000000-005D-UZ00-00003349862

Now, one can compute similaries of the fingerprints, but the actual information is more anonymous than before. It is for sure not the perfect model to anonymize the data, but it fuzzyfies it a bit.

In order to make the fingerprint even more anomyous the current day can be used what will make it even harder to attack.

It is even possible the duration and the time in the fingerprint, what would then have a direct impact on the LSH distance of two fingerprints. In the following, we show an example with the timestamp in the fingerprint and the duration excluded.

DurationFingerprint Smartphone
15AEFGIJ-845D-UZ00-1122004986-071
15000000-005D-UZ00-0000334986-082

The downside is that now queries for the different features with logical conditions (e.g. “Same Bluetooth and GPS distance”) are not so easy possible anymore. In order to allow each signal group would need to be fingerprinted on its own, that would then again cause less anonymity.

After all, privacy is a trade-off to time series analysis capabilities. We described local sensitive hashing helps and a specific project scope will need to decide which level of privacy and functionality at the same time is required.

In case, the reader has additional ideas that they consider noteworthy, please contact us for feedback.

Conclusion

We discussed the differences of Coronavirus geotracking to privacy Bluetooth solutions. We gave insights on how a Coronavrius geotracking app works and as well as insights about concrete implementation details for Android.

Afterwards, we discussed how Time Series Data Analysis can be done with a Time Series Database. We showed how queries on top of the smartphone data form Coronavirus geotracking can be executed and which value the Time Series analysis contributes here.

Ultimately, we talked about possible Coronavirus geotracking privacy extensions. Geolocation and Signal fingerprinting with local sensitive hasing showed that more privacy is generally possible. We saw first indications that more privacy means likely also more limitations in Time Series analysis capabilities.

All in all, Coronavirus geotracking and Time Series analysis of signals and GPS data opens new possibilities for tracking Coronavirus infections and doing extended analysis.

From a technological perspective, we see that Time Series Databases are easing this analysis case.

Out of the transmissions of the data over the web and central databases and similar the privacy risk is inherently higher than in pure Bluetooth tracking. At the same time the analysis capabilities also open a complete new set of possibilities.

Ultimately, it is up to the societies and their politicians to outweight the benefits of geotracking to more annonymous tracking methods.

[wpse_comment_list]

Your free special webinar guest invitation: How to avoid the worst big data failures