Android Sensors - From Github

From Beiwe Wiki
Revision as of 13:37, 18 December 2017 by Msimoneau (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


beiwedata is a set of Python scripts designed to help munge, analyze, and manipulate data generated by the Beiwe application.


from beiwedata import * is the standard way to import and this document will assume beiwedata has been imported to the top level. If you import as, please adjust the usage examples accordingly.

Data overview

There are a total of 12 types of files generated by the beiwe app. Files are stored as comma-separated values (csv) files and every file contains column headers (including empty files). Files are created periodically by the app, and empty files occur when the application records no data during that period. For example, the device runs its periodic "WiFi" check, but the WiFi transceiver may be disabled or it may simply pick up no local WiFi networks. This file is later "retired" by the app so that it may be uploaded and deleted from the device.

# Data stream File prefix
1 Accelerometer accel
2 Bluetooth bluetoothLog
3 Phone calls callLog
4 GPS gps
5 IDs identifiers
6 Beiwe logs logFile
7 Power State powerState
8 Survey (1) surveyAnswers
9 Survey (2) surveyTimings
10 Text messages textsLog
11 Voice memos voiceRecording
12 WiFi wifiLog


Each file prefix above is followed by _[timestamp].csv. The timestamp is Java time in UTC, which is milliseconds from epoch, and represents the time of file creation. Thus, note that this timestamp may differ from the timestamp you see online which represents the time of file upload. (NOTE: The app is designed to only upload when it is connected to WiFi.)

In order to use most conversion tools (which are designed for Unix time — i.e., seconds from epoch), simply perform integer division by 1000. For example, in Python, one must first perform int(timestamp / 1000) before using the datetime module to convert to human-readable time. In Microsoft Excel one might use a formula such as =((timestamp/1000) / 86400) + 25569 and then convert the cell to datetime.

See EpochConverter for more ways to manipulate Unix time into various programming languages.

Accelerometer data

Accelerometer files contain 5 columns: timestamp, accuracy, x, y, and z. Accelerometer on and off periods may vary by study, and sampling rates rates vary by device and possibly by Android OS version as well. Many devices limit sampling while the screen is off, though a few disable the accelerometer completely. High frequency sampling occurs when the screen is on, but that rate differs by OS and phone.

NOTE: The bounds of the x, y, and z values are specific to each phone model. In all our test data the bounds are [-20, 20], but research by the developers indicates that for some devices it is [-10, 10]. It is unclear if that bound value is derived after accounting for acceleration due to gravity. The data recorded by the app is raw accelerometer data and has not been modified to remove acceleration due to gravity.


Bluetooth files contain: timestamp, MAC, and RSSI. Note that MAC is actually the hashed MAC address of other devices. RSSI is in dBm. Note: due to restrictions starting in Android version 6 only a few devices can now report their mac address, instead these should report either "N/A" for their Mac address.

Phone calls

Call logs contain: hashed phone number, call type, date, and duration in seconds. Note that date is the equivalent of timestamp in other files (i.e., it is not a human-readable datetime object).


GPS files contain: time, latitude, longitude, altitude, and accuracy. Note that time here is the equivalent of timestampin other places. Again, programmers may change this in future versions to be more consistent. Accuracy is the accuracy in meters.

NOTE: Altitude accuracy varies significantly from device to device, see this StackOverflow post for details. The value of the accuracy field only applies to horizontal accuracy.

Identifier File

Identifier files should only contain one row and in most cases will only have one instance; the file will get recreated if a user-id is re-registered, and it will contain different identifying information if the user is re-registered on a different device. This file will contain the users own patient id, phone_manufacturer, phone_type, OS version, (hashed) MAC, (hashed) phone_number, and device_id.

NOTE: The device ID is sourced from an operating system value called ANDROID_ID. It is unique to the device, but will change if a user does a factory reset on their device and then reinstalls the Beiwe app. The hashed MAC is of the phone's BlueToothMAC address.

Beiwe logs

Beiwe log files are app-generated messages and should be used only for diagnostic purposes. Data recorded in this log file are subject to change between any version of the Beiwe app, and conform to no consistent formatting.

Power State

Power logs have two columns: time and event. Note that time here is the equivalent of timestamp elsewhere. Power state events are things like screen on, screen off, power connected and power disconnected.

The following data points are introduced in Beiwe version 10. Note: the strings are intentionally over-explicit because Android only provides a notification that state has changed, and the app then has to check the current state. These operation non-atomic, and so the check has a tiny potential to be out of date.

  1. Doze: Doze is a new power state (technically a sleep state) added in Android 6. You can view some useful documentation here. Summary: the Doze state is entered only if a device is unplugged, the screen is off, and the accelerometer indicates no or minimal device motion. It leaves this state periodically to run scheduled tasks (a "maintenance window"), but does so with increasing-in-length sleep periods. The device also wakes up if any of the previously mentioned requirements are interrupted, or if the operating system identifies a "significant location change." _All scheduled data recording sessions and survey notifications are delayed until the device wakes up or enters a maintenance window.

Device Idle (Doze) state change signal received; device in idle state. Device Idle (Doze) state change signal received; device not in idle state.

  1. "Power Save State" This was introduced in Android 5, the extent of documentation is "When in this mode, applications should reduce their functionality in order to conserve battery as much as possible." Apparently actually doing something, like ... reducing GPS frequency?, is up to the manufacturer, but may become one of those things that Android enforces globally in the future. It is not well defined when precisely this mode is entered, it is not well defined what this mode actually does, and currently (Android 6) we do not believe it has a determinable effect on Beiwe Android data collection.

Power Save Mode state change signal received; device in power save state. Power Save Mode change signal received; device not in power save state.

Survey Answers

Variable depending on the survey. (note that this file also contains the text of any questions)

Survey Timings

Variable depending on the survey.

Text messages

Text logs contain: timestamp, hashed phone number, sent vs received, message length, and time sent. In general, time sent should be ignored. It is theoretically the time the message was sent (from somebody else) while timestamp is the time that message would have been received by the user. In practice, these should be identical or very similar.

Voice memos

Self-recorded voice memos. Will vary by study. Voice recordings are in one of two formats: mp4 and wav files. Mp4 files use AAC compression, wav files contain PCM data. All audio files are a single channel.


Contains hashed MAC, frequency, and RSSI. Frequency will always be 2.4GHz or 5GHz so I'm not sure how useful that is. RSSI is in dBm.