Beiwe Data Privacy and Security

From Beiwe Wiki
Revision as of 12:38, 26 September 2017 by Beiweadmin (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Key security aspects of the Beiwe Research Platform

  • Participant names are coded with a unique 8-character Beiwe Participant ID.
  • Participants will login to the Beiwe smartphone application with their unique ID and password.
  • All data collection is tied to the 8-character Beiwe Participant ID (no identifiers like participant name or contact information), and only clinical research collaborators will have access to the master key, which will be stored securely.
  • All data is encrypted in transit and at rest. The application will not store data on the participants’ mobile device in an unencrypted form.
  • Audio recordings (voice surveys) will be encrypted once recording is complete.
  • Indirect identifiers (telephone numbers and IP addresses) will be hashed using an industry recognized strong hashing algorithm, which renders all data unidentifiable.
  • No identifiable data will be stored on the mobile device. All identifiers, except audio recordings (voice surveys), will be rendered innocuous by hashing.

Types of Data Collected

Passive Data

Data that is generated without any direct involvement from the subject, such as GPS data and accelerometer data.

Active Data

Data that requires active participation from the subject for its generation, such as surveys and audio samples.

Supported Data Streams

Data availability based on operating system

Data Anonymity

Participant Anonymity

Every participant is assigned a randomly generated 8-character participant ID (for example, “d4w192bg”), and all participant data are connected only to that ID.

Other Data Anonymity

If collected, these four types of data are hashed by Beiwe using the industry-standard SHA-256 hashing algorithm:

  • Phone numbers of incoming and outgoing phone calls
  • Phone numbers of incoming and outgoing text messages
  • MAC addresses of nearby Wi-Fi routers
  • MAC addresses of nearby Bluetooth devices

Hashing these data means that each phone number and MAC address gets turned into a string of 32-character random numbers and letters, but a certain phone number always gets transformed into the same random string. It’s also impossible (under present-day mathematical theory) to undo a hash.

Imagine that participant D4MAAW called the phone number 617-123-4567 once on Monday, and then received two calls from that same phone number on Tuesday. A researcher analyzing the Beiwe data could see when those three calls happened and could tell that they all involve the same phone number, but couldn’t tell what that phone number was.

Hashing the MAC addresses of Wi-Fi routers has the same effect. A researcher analyzing Beiwe could tell that a participant was near a certain Wi-Fi router at 10am on every morning, Monday through Friday, and could thereby surmise that the participant was probably in the same room at 10am every morning, Monday through Friday, but the data would not reveal the actual MAC address of the router.

Potential Gaps in Data Anonymity

Two types of data can potentially contain personally identifiable information: the GPS data (which record location) and the voice recording data (in which a participant could mention personally identifiable information).

Beiwe’s GPS data provide enough detail to identify individual buildings or street addresses within some degree of confidence, although a fair amount of analysis would be required to transform a series of GPS coordinates into a home address for a participant. As stated above, the Beiwe smartphone application is completely customizable when it comes to data collection, so a particular study could disable GPS data collection from a study if desired if it is not part of the research question.

The app’s voice recording feature does not ask for any identifiable information, but it is conceivable that, in the course of describing his/her day, a participant could speak his/her own name or reveal other identifying details. To prevent this, researchers can add text to the application voice recording screen to ask participants NOT to mention their name, the names of any other people, or any specific locations. An example of user interface text that can be show on the Voice Recording screen is as follows: “Please describe how you’ve felt over the last 24 hours in relation to events that have occurred as well as to upcoming events that are on your mind. It is okay to describe situations and people abstractly (“friend”, “restaurant”) but avoid specific names. When you are ready, press ‘Record’ and speak for no more than 4 minutes. Press ‘Stop’ when you are finished, ‘Play’ to listen to the recording, and ‘Done’ to submit the recording."

Participant Authentication

Login Protection

Participants must log in to use the app in any capacity with a minimum 6-character password. All functions of the app, including filling out surveys and making voice recordings are protected behind a login wall. The only parts of the app that are not login-protected are the “Call My Clinician” button and the notification reminders that say either “please take a survey” or “please make a voice recording.” The app automatically logs out after a configurable number of minutes of inactivity.

If a participant forgets his/her password, the participant can have the password reset by calling “Call the Research Assistant” button in the Forgot Password section of the application. The application will inform the participant that they should not reveal their name to the Research Assistant when requesting a password reset. The Research Assistant will give the participant a temporary password over the phone, and the participant is immediately required to choose a new, permanent password in the application.The server does not store the participant’s plaintext password, only the participant’s hashed password.

When calling the Research Assistant to request a password reset, the identity of the caller will be verified by the participant when he or she reads the clinical research assistant their Beiwe Patient ID number. The number programmed into the participant’s version of the Beiwe app will be the phone number for the clinical collaborator research assistant (collaborator’s staff, not the Onnela Lab staff). The participant’s Beiwe Patient ID number is listed within the app on the Forgot Password section of the application, so is readily available to the participant when calling the clinical research assistant, as is a prompt for the participant not to reveal their name or any identifying information. Using only the Beiwe Patient ID number to verify the identity of the caller is more secure than other alternative methods (ex: name or email address).

Signup Safeguards

In order for a participant to use the Beiwe app, a study administrator must create a participant ID and temporary password for that participant. Anyone can install the Beiwe app by downloading it from for Android devices or from the Apple app store for iOS devices, but the app does nothing unless the user registers it with a valid participant ID and password provided by the study coordinators to register the app in a particular study.

A participant ID can only be connected to one phone at a time; if someone tries to register a second phone with a participant ID that is already registered to an existing phone, the second phone will not be able to register.v

In order to upload data, a phone must have a valid username, password, and phone ID number. This is to prevent unauthorized phones from spoofing data.

Data Encryption

All data on phones, on the server, and in-transit use industry-standard encryption techniques. The phone also uses asymmetric encryption, meaning that even the phone cannot read its own data; data recorded on the phone can only be read on the

During registration the device is provided with the public half of a 2048 bit RSA encryption key. With this key the device can encrypt data, but only the server, which has the private key, can decrypt it. The RSA key is then used to encrypt a symmetric AES key for bulk encryption. These keys are generated as needed by the app, are not stored, and must be decrypted by the server before any data can be recovered. Data received by the server are then re-encrypted with a master key provided for that study, and then stored on Amazon S3, an industry-standard secure storage platform housed in data centers that are protected by armed guards.

Amazon Web Services has released a whitepaper[1] describing how EC2 and S3, the two Amazon services Beiwe uses, meet HIPAA compliance standards. Encrypted Beiwe data is stored on the Onnela Lab AWS account, which only Jukka-Pekka Onnela and authorized Onnela Lab staff has login credentials to access. All data connections to the web service hosting the study are negotiated on industry-standard SSL/TLS connections, removing the vulnerability of man-in-the-middle attacks or packet-sniffing data leaks.

Below is a visual of the data encryption system including the phones, Amazon servers, and the separation of participant information behind a collaborator’s Firewall (if collaborators choose to store patient information electronically).

Data encryption system including the phones, Amazon servers, and the separation of participant information behind a collaborator’s Firewall (if collaborators choose to store patient information electronically).
  1. (