Category Archives: Problems

How the AirBnB-App is tracking your location

AirBnB can be used to find rooms in other cities while you travel. For that purpose, it also offers an official Android Application. As the app requests some dangerous permissions (Location, Contacts, …), I enabled the “privacy guard” feature of CyanogenMod right away, which blocks access to location and contacts and asks the user to confirm each access to one of these ressources. Due to these prompts, I noticed that AirBnB requests your location a lot, including while the app is not active (in the background, but not terminated).

This made me curious, so I set up mitmproxy to take a look at the network traffic of the app. Fortunately for me (and unfortunately, in general), while it uses HTTPS to phone home, it does not implement certificate pinning, so it was trivial to get a dump of the requests and responses it sends and receives. And, as it turns out, AirBnB is indeed very curious.

When is your location disclosed?

The app always sends your current location when it is started. In fact, a whole host of information is sent to AirBnB, including your GPS location with a precision of seven decimals, your current city in human-readable form, your system language and OS version, the type of your device (phone, tablet), and even a bunch of settings you can presumably set if you are logged into your account on the website. Judging from the presence of a “is_logged_in”-Field, I assume that this information will be linked to your account if you are logged into the app (I was not).

The app will also send your GPS location if you search for offers and while it loads the offers in the “discover”-tab (where it will display some featured places and locations you could travel to). It has to be stressed that the location is not actually needed for any of this, it’s just AirBnB being curious and wanting the data for their analysis, I assume (they also use a bunch of other trackers, including Google Analytics, Newrelic, Flurry, and Facebook, but as far as I could find out, they do not disclose the location to these). There are probably a lot of additional cases where your location is sent to AirBnB, but I stopped here, mostly because I was not interested in sending them even more data.

AirBnB also regularily requests your current location every 5 minutes, but does not send it to the server, as far as I can tell.

For what is your location used?

That is the big question. As the data is not needed to answer your queries, I can only assume that they are using it for their analysis software. So, let’s take a look at their privacy policy:

“When you use certain features of the Platform, in particular our mobile applications we may receive, store and process different types of information about your location, including general information (e.g., IP address, zip code) and more specific information (e.g., GPS-based functionality on mobile devices used to access the Platform or specific features of the platform).”

Okay, interesting. Is there a way to opt out of this?

“If you access the Platform through a mobile device and you do not want your device to provide us with location-tracking information, you can disable the GPS or other location-tracking functions on your device, provided your device allows you to do this. See your device manufacturer’s instructions for further details.”

Oh. Okay. And for what, precisely, are you using the data?

We use and process Information about you for the following general purposes:

  1. to enable you to access and use the Platform;
  2. to operate, protect, improve and optimize the Platform, Airbnb’s business, and our users’ experience, such as to perform analytics, conduct research, and for advertising and marketing;
  3. to help create and maintain a trusted and safer environment on the Platform, such as fraud detection and prevention, conducting investigations and risk assessments, verifying the address of your listings, verifying any identifications provided by you, and conducting checks against databases such as public government databases;
  4. to send you service, support and administrative messages, reminders, technical notices, updates, security alerts, and information requested by you;
  5. where we have your consent, to send you marketing and promotional messages and other information that may be of interest to you, including information sent on behalf of our business partners that we think you may find interesting. You will be able to unsubscribe or opt-out from receiving these communications in your settings (in the “Account” section) when you login to your Airbnb account;
  6. to administer rewards, surveys, sweepstakes, contests, or other promotional activities or events sponsored or managed by Airbnb or our business partners; and
  7. to comply with our legal obligations, resolve any disputes that we may have with any of our users, and enforce our agreements with third parties.

So, basically, they reserve the right to do whatever they want with your data. Great.

Why is this bad?

Your current location is not their business (quite literally). They only offer one function that technically requires them to know your current location, and that is “accomodations around me”. In all other situations, your current location is not needed to serve your request, so it should not be disclosed to them. This is not some esoteric concept, this is basic privacy. Also, the best way to prevent the misuse of personal information is not to collect the information in the first place.

AirBnB’s reaction

I contacted the AirBnB-Support via Twitter and, later, via eMail. The response I got wasn’t very helpful:

The current location is requested in order to provide you rapidly with listings around your area whenever you go to search for a place. You should receive that request when starting it.

This may explain the periodical requests every five minutes, but does not explain why the information is sent to the server. AirBnB, if you are reading this, feel free to contact me or comment on this article.

Closing notes

AirBnB is probably not the only offender in this regard. It probably isn’t even the worst offender. I’m just using it to illustrate a growing trend among companies to collect everything, no matter if they need it. They may not misuse this information. They may even not use it at all. The problem is that I do not know what they are doing. And the hunger for more and more data, combined with the secrecy around what it is actually used for, makes me uncomfortable.

Transport security in online banking, or: “Not my Department”

Lately, I have been trying to improve transport security (read: SSL settings and ciphers) for the online banking sites of the banks I am using. And, before you ask, yes, I enjoy fighting windmills.

Quick refresher on SSL / TLS before we continue: There are three things you can vary when choosing cipher suites:

  • Key Exchange: When connecting via SSL, you have to agree on cryptographic keys to use for encrypting the data. This happens in the key exchange. Example: RSA, DHE, …
  • Encryption Cipher: The actual encryption happens using this cipher. Example: RC4, AES, …
  • Message Authentication: The authenticity of encrypted messages is ensured using the algorithm selected here. Example: SHA1, MD5, …

After the NSA revelations, I started checking the transport security of the websites I was using (the SSL test from SSLLabs / Qualys is a great help for that). I noticed that my bank, which I will keep anonymous to protect the guilty, was using RC4 to provide transport encryption. RC4 is considered somewhere in between “weak” and “completely broken”, with people like Jacob Applebaum claiming that the NSA is decrypting RC4 in real time.

Given that, RC4 seemed like a bad choice for a cipher. I wrote a message to the support team of my bank, and received a reply that they were working on replacing RC4 with something more sensible, which they did a few months later. But, for some reason, they still did not offer sensible key exchange algorithms, insisting on RSA.

There is nothing inherently wrong with RSA. It is very widely used and I know of no practical attacks on the implementation used by OpenSSL. But there is one problem when using RSA for key exchanges in SSL/TLS: The messages are not forward secret.

What is forward secrecy? Well, let’s say you do some online banking, and some jerk intercepts your traffic. He can’t read any of it (it’s encrypted), but he stores it for later, regardless. Then something like Heartbleed comes along, and the same jerk extracts the private SSL key from your bank.

If you were using RSA (or, generally, any algorithm without forward secrecy) for the key exchange he will now be able to retroactively decrypt all the traffic he has previously stored, seeing everything you did, including your passwords.

However, there is a way to get around that: By using key exchange algorithms like Diffie-Hellman, which create temporary encryption keys that are discarded after the connection is closed. These keys never go “over the wire”, meaning that the attacker cannot know them (if he has not compromised the server or your computer, in which case no amount of crypto will help you). This means that even if the attacker compromises the private key of the server, he will not be able to retroactively decrypt all your stuff.

So, why doesn’t everyone use this? Good question. Diffie-Hellman leads to a slightly higher load on the server and makes the connection process slightly slower, so very active sites may choose to use RSA to reduce the load on their servers. But I assume that in nine of ten cases, people use RSA because they either don’t know any better or just don’t care. There may also be the problem that some obscure guideline requires them to use only specific algorithms. And as guidelines update only rarely and generally don’t much care if their algorithms are weak, companies may be left with the uncomfortable choice between compliance to guidelines and providing strong security, with non-compliance sometimes carrying hefty fines.

So, my bank actually referred to the guidelines of a german institution, the “deutsche Kreditwirtschaft”, which is an organisation comprised of a bunch of large german banking institutes. They worked on the standards for online banking in germany, among other things.

So, what do these security guidelines have to say about transport security? Good question. I did some research and came up blank, so I contacted the press relations department and asked them. It took them a month to get back to me, but I finally received an answer. The security guidelines consist of exactly one thing: “Use at least SSLv3“. For non-crypto people, that’s basically like saying “please don’t send your letters in glass envelopes, but we don’t care if you close them with glue, a seal, or a piece of string.”

Worse, in response to my question if they are planning to incorporate algorithms with forward secrecy into their guidelines, they stated that the key management is the responsibility of the banks. This either means that they have no idea what forward secrecy is (the reponse was worded a bit hand-wavy), or that they actually do know what it is, but have no intention of even recommending it to their member banks.

This leaves us with the uncomfortable situation where the banks point to the guidelines when asked about their lacklustre cipher suites, and those who make the guidelines point back at the banks, saying “Not my department!“. In programming, you call that “circular dependencies”.

So, how can this stalemate be broken? Well, I will write another message to my bank, telling them that while the guidelines do not include a recommendation of forward secrecy, they also do not forbid using it, so why would you use a key made of rubber band and rocks if you could just use a proper, steel key?

Don Quixote charging the windmills is licensed CC BY-SA 2.0
Don Quixote charging the windmills by Dave Winer is licensed CC BY-SA 2.0

And, of course, the more people do this, the more likely it is that the banks will actually listen to one of us…

The sorry state of attribution in education

Lately, I’ve been dealing a bit with attribution and licensing, in part thanks to the people at Commons Machinery / elog.io. Since I started on this, I have noticed all the places where attribution is done poorly or even not at all. Be it on flyers (“Pictures: Wikipedia / something / somethingelse”, without any indication which picture is from where, or which license they are under), websites (often no attribution at all) and even on the slides at university.

It just boggles my mind that academic researchers in computer science, who will meticulously enter citations and go into a frenzy if they are done incorrectly by students (seriously. A slightly wrong formatting cost me several points once), think nothing of just slapping a few pictures they found somewhere onto their slides, probably without checking their license, and definitely without attributing the artist (which is almost always required by licenses, by the way). And it somewhat makes me feel like an idiot for spending the time properly attributing the two xkcd comics I used in my presentation (yes, I use xkcd in my slides. Judge me).

This is wrong. I shouldn’t feel like an idiot for spending 1 minute getting the attribution right if the artist spent a few hours creating the thing I’m using. And if the artist was nice enough to pick a license for his / her work, and if he / she was also awesome enough to pick a license that actually allows me to use their work, and all they ask in return is that I credit them while doing so, I should damn well do so. Everything else would, in my opinion, be disrespectful to both the artist and the art.

So, why is no-one attributing properly? Because it’s hard. It’s annoying. First, you have to find out if the artist even picked a license (some do not). Then, if the license allows usage, you have to find out the name of the artist. You have to write up a boilerplate text, something like “‘Like I’m five’ by Randall Munroe / xkcd.com // CC BY-NC 2.5 // Source: xkcd.com/1364“. You have to fit it into your design somehow. And then, for all you know, no one will even notice that you took the time to do so. And this assumes that you even know that you are supposed to attribute under a specific license, and how. Even professional writers like my favourite author, Patrick Rothfuss, can get this wrong. Pat wrote a blogpost and used an image from xkcd.com, without attribution. After he was notified about the missing attribution by a reader, he promptly added some attribution (which is good), but the attribution itself was still not properly done (“Comic lovelyness from the brilliant XKCD, of course” is sweet and a nice thought. It is also better than nothing. But it is not entirely correct as per the license).

Don’t get me wrong. This is not about me pointing out what my favourite author did wrong. I’m just using this as an example. Attribution is hard, and while I have some hope for the work by Commons Machinery / elog.io, it’ll probably be another year at least until there is something working, moderately bug-free and usable, and adoption by the general public may never come. This is because people are not aware of the attribution problem.

And why are people not aware of it? Because almost no one is doing it right! If, for example, in university, all slides would only carry properly attributed images, people may start to wonder “what is it with all those CC BY-NC-SA’s on the slides?”. People may even start to notice if those CC BY-NC-SA’s go missing. Right now, almost no one is doing this, because almost no one is thinking about it, because almost no one is doing it. Do you see the problem?

What can you do? A few ideas:

  • Practise proper attribution. Seriously. Yes, it’s annoying, but just imagine other people using your stuff without attributing you as the original artist. Would that feel good?
  • Pick a License for your stuff. Don’t just throw it out there, pick a license and stick it on your website. Here’s a license picker for Creative Commons, which is used mostly for texts and media, and here’s a license picker for open software licenses. Choose a license, stick it on your work, and you make the life of people like me easier (This work is licensed CC BY 4.0, by the way, as you can see in the sidebar). Bonus points if you inform youself about the advantages and drawbacks of the different licenses. For example, choosing “Non-Commercial use only” licenses may have unintended consequences, like keeping others, including non-profits, from using your work on their pages.
  • Get involved with Commons Machinery, and register as a beta-tester for elog.io. They can always use more hands and brains, and it looks like their stuff is going in the right direction.
  • Ask questions. Tell your regional newspaper that “Picture: Wikipedia” is not a proper attribution. Ask your professor why the images are not attributed. Raise some awareness.
  • Practise proper attribution. Did I already mention this? Oh, well, it bears repeating.

So, that’s it for todays semi-rant. I’m looking forward to seeing proper attribution from all of you, and I will probably send an email to my professors about this tomorrow.

Those Wonderful Evenings or: What can go wrong, will

It is the general consensus that IT Problems are pack animals. This evening once again proves this.

It all began in a completely harmless fashion: For a week or so, I had been having problems with my work eMail Account. The Organization I work for hosts their eMail Infrastructure at Microsoft, Office365 to be precise. This enables us to use the full Exchange functionality at no cost (Since it is an educational institution, Office365 is free for us). Personally, I could not care less about that, since I am using Linux exclusively, but hey, as long as I can get my mails, whatever.

So, about a week ago, Thunderbird started reporting errors when connecting to my eMail Account. It babbled something about the login to the server failing, offering me to Retry, Change my password or cancel. I hit retry a few times, and eventually the message stopped. I assumed Thunderbird had finally managed to get a connection with the Server and happily kept doing whatever I was doing. I never wondered why I wasn’t getting any mail, since it was very rare for me to receive any mail on that account anyway. This was mistake number one: Assuming that Thunderbird would not stop trying to connect without explicitly telling me about it.

This impression was intensified by the fact that my daily backups of my mail accounts were running just fine. My Cronjob reported no errors. And, at the beginning, my Android Phone running K9 Mail kept getting the messages I got sent. The phone did not report any new mails either, and no errors, so I assumed that everything was fine, since even if Thunderbird was silently stopping the connection attempts, and even if my daily backup was having unreported errors, at least the phone would surely complain if it was unable to get a connection with the servers. Triple redundancy in error reporting, what could possibly go wrong?

Famous last words.

So, today, the error messages of Thunderbird finally pissed me off enough for me to investigate. Our eMail Service had recently been migrated (by Microsoft) from Live@Edu to Office365. The Documentation for the upgrade claimed that no changes would have to be made, so I left my Settings the way they were, and they kept working happily through the upgrade, even permitting me to send a notification to the organization, notifying them of the completed migration, and fixing a few Problems that occured afterwards. That was all before my problems set in.

So, as I said, I started to investigate. I tried to find out which servers I was supposed to use, and updated my Thunderbird config. The problems were still there. Curious, I logged into webmail to check if my account was still active and my password still worked. It was and did. After the login, I was greeted by “9 new messages”, the oldest going back to last monday.

I will not bore you with my struggles to get Thunderbird working. I triple checked password and server settings, changed my password, waited 30 minutes, nothing would work.

Curious how my Android had kept working through all of this (or had it? I had never seen those messages after all), I started up K9 Mail and tried to refresh my account. It went through without error message, but also without downloading the new messages. I updated the server information and suddenly, I got an error message, claiming a wrong password. Great. After deleting and re-creating my K9 Mail Config for the account, I still could not get it to work. K9 Mail had not been able to connect to the server for a whole week, but had not seen fit to inform me of that. Awesome.

Now, I was really interested in how my backups had kept working through all of this. I manually ran my backup software (I was using OfflineIMAP), only to see that the Program was throwing an exception when trying to connect to my account. The exit status (“echo $?”) was still zero though, indicating success. Frustrated, I hit up their GitHub-Page, intending to write a bug report, when I realized that I was running a horribly outdated version that I had installed from the Raspbian-Repositories (Debian for Raspberry Pi). I removed the old version, installed the current one, and retried the run, being met with an Error about cert_fingerprints not being set. The Program still exited with 0, by the way, even though someone who was running an automatic update of the program using apt-get, for example, would never have seen this change, and thanks to the success indicator of the exit status, would have never been notified that his backups were failing. I wrote up a bug report, fixed the config file, and tried again. Now, I was getting the “LOGIN FAILED” I expected, but the Program STILL exited with a Status of zero. I sighed heavily (actually, I cursed loudly), updated my bug report and mailed Microsoft Support about this problem.

It has been two hours now, and I have found:

  1. One case of bad coding in Thunderbird (not reporting when stopping the connection attempts)
  2. One case of a lacking Error messages in K9 Mail
  3. Two cases of a potentially fatal wrong exit status on OfflineIMAP
  4. One case of WTF about Microsoft (Seriously, why doesn’t this crap work?)
  5. One case of foul mood and desire to punch cute kittens

Lessons Learned:

  1. Don’t rely on error messages being there if you have never seen them
  2. Don’t rely on the exit status of software you have not written yourself and / or tested.
  3. Don’t be sure that since you have three different ways of being notified when something goes wrong, you actually will, unless you have tested at least one of them (Basically 1 and 2 combined)
  4. Even (or: especially) a billion dollar company like Microsoft can and will screw up, and they will probably not fix it if you do not complain.