This article was updated on 28 May 2019 to include a follow up response from Chatrbox
A security researcher has discovered a massive cache of data for millions of Instagram accounts, publicly accessible for everyone to see. The account included sensitive information that would be useful to cyberstalkers, among others.
A security researcher calling themselves anurag sen on Twitter discovered the database hosted on Amazon Web Services. It had over 49 million records when discovered and was still growing before it was deleted.
The Instagram data included user bios, profile pictures, follower numbers and location. This information is viewable online. What’s more puzzling is that it also contained the email address and telephone number used to set up the accounts, according to Techcrunch, which broke the story.
Reporters identified the owner of the database as Mumbai-based social media company Chtrbox. It pays social media influencers to publish sponsored content through their accounts. The database has since disappeared from Amazon.
Response from Chatrbox
Chatrbox took issue with press coverage of the leaked records, sending Naked Security the following statement:
The reports on a leak of private data are inaccurate. A particular database for limited influencers was inadvertently exposed for approximately 72 hours. This database did not include any sensitive personal data and only contained information available from the public domain, or self reported by influencers.
We would also like to affirm that no personal data has been sourced through unethical means by Chtrbox. Our database is for internal research use only, we have never sold individual data or our database, and we have never purchased hacked-data resulting from social media platform breaches. Our use of our database is limited to help our team connect with the right influencers to support influencers to monetize their online presence, and help brands create great content.
Updated 28 May 2019:
Several days after we initially contacted it, Chatrbox sent a follow-up statement saying that only 350,000 users had been affected, and that the information “was non-sensitive publicly available data, or in some cases self-reported data by Chtrbox users.” Facebook also reportedly said that no private emails or phone numbers of Instagram users were accessed, which suggests that phone numbers in the database may have been made public by the accounts’ owners.
The database in question was “a secondary database, collated by us, containing public data that our internal team refers to discover influencers,” Chatrbox added. [end of updated text]
How might someone compile a massive database of Instagram information?
The company wouldn’t answer any more questions, so it’s difficult to know for sure. User names, profile shots, and follower numbers are publicly available and could be gathered by screen scraping. Screen scrapers use automated scripts to visit websites and copy the information they find there.
Companies use scraped data for all kinds of purposes, such as price comparisons and sentiment analysis. It’s considered malicious and many publishers try to block it because the scrapers are using their proprietary data and also draining their server resources.
We’ve seen people scraping Instagram before. Redditors attempted to archive every image from the site that they could, for kicks.
But it can get you into trouble. Authorities in Nova Scotia, Canada arrested a 19-year-old for scraping around 7,000 freedom-of-information releases from a public web site there, calling him a hacker. They subsequently dropped the charges.