Data scraping, information gathering activity

date scrapingWe are in the era of data economy in which the data represent a real factor of production. Companies whose activity is based on data processing need automated tools – in other words, ICT tools – to collect data from sources present on the internet which, due to this condition, are not structured (the unstructured data present on the net , are usually represented in HTML format).

The practice known as date scraping describes an information gathering activity, usually conducted online, with the aid of appropriate software that extracts data – even of a personal nature – from websites, databases, files or from the body of electronic communications sent via e-mail . Consider, for example, the activity of extracting texts from social networks for the purposes of sentiment analysis or to implement strategies of data-driven marketing.

More specifically, the collection can take place by means of software that simulates the navigation performed by real users in order to filter and acquire data, also by circumventing the security measures adopted by the providers who manage such data. The information acquired in this unauthorized manner may be subject to subsequent processing, also in order to set up structured databases and, in the case of personal data, to create specific personal profiles.

With regards to personal data, the intervention of the Guarantor for the protection of personal data should be recalled which, in 2016, declared illicit the treatment, carried out by the manager of a site, consisting of ‘fishing’ online in a systematic and indiscriminate way for data and information to create a telephone directory see Provision no. 4/2016). In this regard, the Guarantor observed that companies intending to create such a list must use the single database (DBU), i.e. the electronic archive which collects telephone numbers and other customer data of all national fixed telephony operators and mobile. Alternatively, for the treatment of scraping these companies must acquire a consent from the interested parties that is free, informed and specific for each purpose that they intend to pursue with this treatment.

In its provision, the Guarantor reaffirmed the rules on the creation of telephone directories and considered the online publication of a telephone directory not established starting from the DBU and without the consent of the interested parties a particularly invasive treatment due to the easy availability of data even through the most common search engines. More generally, the treatment of scraping of personal data must take place in compliance with the discipline established by the Regulation (EU) 2016/679 (GDPR).

One LEGAL | GDPR expert
All the legislation on privacy, the guidelines of judicial bodies and the Guarantor Authority, many operational tools for each fulfillment: practical guides, commentaries, magazines, action plans, check lists, formulas, news.

A recent case of scraping is what it concerned the social media Facebook (together with Twitter and Instagram), but also other big techs such as Amazon Youtube, Linkedin with the activity carried out by two companies, BrandTotal Ltd. and Unimania Inc. The latter have implemented a data collection system who violated the terms and conditions of service of the aforementioned platforms, ultimately implementing an abusive access.

In particular, in the case of Facebook, through a series of browser ‘extensions’ called “UpVoice” and “Ads Feed” which simulate human web browsing, it was possible to overcome the security measures already adopted by the social network in relation to previous attempts to breach personal data; the installation and use of the extension allowed the two companies to have access to social profiles and other information, including of a personal nature, thus obtaining a large amount of data in the absence of both the consent of the interested parties and the authorization of the host platform.

The date scrapingtherefore, while not illegal per se – Google, through its ‘parser’ programs, uses methods of scraping to analyze websites and extract the contents which it will then use for its own cataloging – is potentially suitable for integrating an illegal conduct when it translates into unauthorized and indiscriminate access to other people’s data. The prospect of accessing personal data, commercial information, industrial projects, know-howand other value-added data evidently translates into a breach of security in terms of confidentiality which can have an impact on the rights of natural persons (privacy, opinion, non-discrimination) and legal persons (intellectual property, industrial property, economic initiative).


Assodata and Isdifog

>> Discover all the dates of the courses on privacy by Altalex!

Data scraping, information gathering activity