DATA UNSOUGHT IS GOOD, BUT GIVEN UNSOUGHT IS BETTER – DO WE REQUIRE A FAIR, REASONABLE AND NON - DISCRIMINATORY LICENSING SYSTEM FOR BIG DATASETS IN THE INFORMATION AGE? by -Sarvagya Chitranshi

ABSTRACT

Data has undoubtedly acquired an essential importance in the information age. However, this data is largely controlled in a monopolised fashion by certain big tech companies that raises questions about the legitimacy of their transactions and the manner in which they affect competition in the digital space. It is therefore, essential to treat data as a commodity that equips such companies with nearly undisputed dominance and allow for it to be traded under equitable instruments. There have been related issues with respect to abuse of the dominance acquired by such companies but they are generally a responsive reaction to a new-age problem within the existing legal framework, rather than a clear solution. The following research paper therefore, focuses on the aspects of how data has risen to be a fundamental structural entity for digitally operating companies and the manner in which it raises pertinent issues with respect to the spirit of competition. It analyses the market position of business entities that command control over such data and its impending consequences. It thereafter, proposes a model of fair, reasonable and non-discriminatory licensing for essential datasets and the manner in which it should be regulated.Given the lack of judicial rulings or opinions on the issue, it evaluates if a jurisprudence around this can be and more so, should be developed. It then, delves deeper into the possible consequences of such licensing and addresses general concerns around effects on consumers and privacy issues. It weighs the moral obligations against practical considerations to be made with respect to the management of big data and their usage in the modern world.

INTRODUCTION

It is an undoubted observation that digital data has had an unprecedented growth in the recent years. The organisation of such data related to right from an individual’s habits to a community’s behaviour is the underlying basis of various recently created and emerging sectors of the economy. In fact, it is generally even regarded as the foundation of the present digital economy’s infrastructure.² Various products are being directly built by tech giants, with the aid of organised data analytics and even companies in sectors like FMCG³ or industrial development⁴, would regard it as a primary tool for their future progress.

It is in such circumstances that dealing with datasets as a resource and a commodity must be clearly looked upon, especially under competition law. Deals between companies like Facebook-Jio⁵raise doubts not only basedon their dominance over the market in terms of market share but also on their arsenal of information. The underlying feature of their dominance is their giant repository of user-data that is nearly inaccessible to their competitors.⁶

Therefore, it is essential for competition law to look into the regulation of such deals, not necessarily from a restrictive point of view but from an approach of promoting equity and thereby fair competition. The following essay delves deeper into the requirements for such considerations, the requisites for the possibility of creating a fair and reasonable licensing mechanisms for datasets and the possible consequences of such an implementation.

THE CIRCUMSTANCES AND THE NEED FOR SUCH CONSIDERATION

Having a preliminary realisation of how data has become a resourceful commodity that needs to be regulated, it is important to have a deeper understanding of how it is growing into one of the most essential priced possessions of the information age.

BIG DATA – INFORMATION ON THE RISE

“Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so, must data be broken down, analysed for it to have value.” — Clive Humby, 2006.⁷

It is essential to understand what Big Data stands for and how datasets based on the same are prepared for further use by the tech giants. There are three essential features that can be associated with how we understand or classify Big Data and two others, which although not essential but are becoming increasingly typical. These five features are volume, velocity, veracity, variety and value.⁸ The first is volume – the primary distinguishing factor, as the name suggests, is larger and comprises of more data points compared to other data sets. Since, a lot of this data is automatically collected, organised, stored and even generated in a lot of cases, its veracity is important. The velocity of the datasets means the pace at which such sets are being created from the collected information.⁹ Variety refers to the diversity and types of data collected along with the sources that it is collected from. Lastly, from a derivative of all these features i.e., upon their sufficient presence, a dataset or a Big Data corpus would gain value.

Now in 2011, the senior vice-president of Gartner, Peter Sondergaard, took Humby’s concept further. He has famously remarked that,

“Information is the oil of the 21st century, and analytics is the combustion engine.”¹⁰

This summarises the immense importance of data science in contemporary times¹¹, moreover, stressing upon its value as a formative entity upon which further innovations can be made. This makes it much similar to standard patents (as discussed further) in its application and usage.

However, what this also means is that its not just the dataset that has the intrinsic requirements for innovation. It is barely the minimum requirement or the basic essential that can be analysed depending on the purpose it is being used for or the product that is being developed with its aid. This lays the ground for it primarily being a fundamental requirement that aids competition, rather than a self-generated innovation that must not be shared at any cost.

THE REQUIREMENTS FOR IMPLEMENTATION

It is clear from our present discussion that information encapsulated in datasets is something that does need clear regulation like other commodities covered under competition law (e.g., patents). However, it is important to ascertain if the existing licensing systems, particularly FRAND applies to datasets as well as it does to standard essential patents (SEPs). Post that, we need to have a strong foundation which demands an intervention of competition authorities or policy makers in the matter. It is also important to ascertain if datasets in particular meet certain requirements to be considered as fundamentally important and exclusive as essential patents. Finally, it is imperative to develop a jurisprudence in this regard that addresses the void in the judicial understanding of issues related to data and datasets.

FRAND LICENSING AND ITS APPLICABILITY WITH SEPS – CONSIDERING THE PRINCIPAL SIMILARITY

It is essential to understand how FRAND licensing works currently and then apply a similar structure to our context.

What is an SEP?

International Organization for Standardization defines a formal standard as “a document established by a consensus of subject matter experts and approved by a recognized body that provides guidance on the design, use of performance of materials, products, processes, service systems or persons”.¹² (According to ISO/IEC Guide 2:2004 Standardization and related activities). A Standard Essential Patent is therefore, a particular patent that protects a technology which is necessary for being used in a standard. Inventions that must be used to implement a particular standard process, are hence, covered under its ambit. Since, there presence is imperative to largely everyone from a particular sector or working on variations or types of a similar product, a fair, reasonable and non-discriminatory (FRAND) patent license is issued for its use – an agreement between the necessary patent holder and the person looking to use it.

What is FRAND?

FRAND means fair, reasonable and non-discriminatory. The goal of FRAND is to strike the correct balance between the interests of technology users and those of technology providers. As a result, both parties’ (patent licensor and licensee) interests should be addressed during negotiations. The FRAND value should not be so high that it makes the implementer’s business unsustainable, making it difficult or impossible for the technology supplier to reinvest in R&D to produce new technological breakthroughs.¹³

Now, this essay suggests that a similar structure should be devised for datasets. The primary similarity between them and patents are their fundamental usage to maintain the standard of a particular product or service. It is highly evident from the vast majority of it being maintained by giant tech corporations, which restricts the entry of new players.

DOMINANT POSITION AND ITS ABUSE

Large datasets are definitely a competitive advantage¹⁴, but it is essential to analyse till what extent.More so, to analyse if they have led to a domination position of certain companies and if they have misused it. FTC clearly says that the purpose of antitrust laws is to “prohibit conduct by a single firm that unreasonably restrains competition by creating or maintaining monopoly power.”¹⁵A dominant position is the power to behave to an appreciable extent independently of its competitors, its customers, and consumers.¹⁶ It is a known principle in competition law that a dominant position is not necessarily an issue for regulatory authorities but its abuse certainly is.¹⁷ The EU covers it under Article 102 of the TFEU¹⁸and the US has prohibition against “monopolization leading to an anticompetitive outcome” under Section 2 of the Sherman Act.¹⁹The EU jurisprudence also recognises that a dominant undertaking has special responsibilities to not allow its behaviour to impair genuine competition.²⁰

It is in this context that the conduct of tech giants must be assessed. Centralization of ownership of datasets propel them to reshape markets²¹ in a manner that might not necessarily be advantageous to the end user. An abuse of dominant position has been already observed in case of Google²² in Europe. Apart from a specific case like this, the Big Tech Antitrust Report²³ by the US Congress has conclusively held the position of demarcating the activities of the big tech companies as anti-competitive. The fundamental basis of such an anti-competitive advantage is largely the enormous database that the companies hold. Every product and service built upon it just furthers the cause of an unfair advantage and makes it increasingly difficult for new entrants to introduce competitive products.

It is thereby necessary to ascertain a reasonable and fair transfer of datasets, to be used as fundamental building blocks for the maintenance of the standard of services being provided in such monopolized sectors. However, it is also important to ensure whether these datasets actually pass the threshold for being licensed in such a manner. It is important for them to be a). unique and b). actually, difficult to acquire and to recreate.

UNIQUENESS AND THE ABILITY OF COLLECTION

Now, for the purposes of availing a similar level of transactional value, it has to be ascertained if datasets in particular possess three essential characteristics –

Investigation of Competition in Digital Markets(2020 US House of Representatives)

Are they unique?
Is it difficult for other companies to collect and create similar datasets for their requirements?
Do companies find the need to access the very same dataset for creation of other products and services?

The lack of judicial attention or academic literature on this front creates a void in ascertaining if these questions can be answered positively. However, it does lay out a preliminary testing groundwork to asses if a particular dataset can be treated as “essential” and if the product or service it is being used for, should follow a manner of “standardization.” The evaluation criteria do not offer a conclusive measurement of datasets here in particular but provides a framework which should be followed for the evaluation of any big data conglomerations. Statistical answers to the stated questions shall provide for an appropriate judgement of whether any particular dataset can be considered to be eligible for the licensing in consideration.

DEVELOPING A JURISPRUDENCE

The primary concern regarding the reasonable trade of datasets or the determination regarding their impact on overall competition in the digital space is the clear lack of both academic and judicial literature. This compels us to go through the facts and propose certain models for better understanding and research in the sector.

From a point of view of the companies, it is proposed that private actions by the industry may be more effective than statutory interventions.²⁴ This is after we have established that both dominance due to standardized datasets and their abuse isn’t something imaginary or unprecedented. It is even possible to protect data by an overlapping patchwork of different intellectual property rights.²⁵This can be achieved by enlarging the ambit of such instruments and establishing their application to datasets. We are looking at a combination of existing rights or even the development of a new set of intellectual property rights for datasets. Considering the existence of the same, it is entirely possible to realise their imperative importance in creating digital products. The European Union has recognised the protection of the content of a database out of sui generis right. The structure itself is protectable under copyright.²⁶ The sui generis nature of a database the importance of the effort put in collecting and preparing the data base itself and therefore, is distinct from other intellectual property rights where originality is considered to be a very important factor. Given the recognition, popular products or services which are indispensable for a digital product, must be standardised and the datasets used in augmenting their functionality must have a FRAND license. Examples of this would include specific sections ofregion-based behaviour on e-commerce sites or the healthcare datasets for hospitals.²⁷

There is a further need for the parliaments, courts and legal luminaries to conclusively determine the essentiality of datasets, their role in development of digital products and other related issues. There is barely a miniscule of actual legal opinion on these issues and this essay attempts to steer the discussions in this direction. Upon this determination, we can recognise essential datasets, both according to their information and organizational structure. These can then be traded upon a clear and reasonable deal that provides new entrants with the requisite datasets or its portions to prepare standardized products.

Its also important to discuss the open access system here to stress upon the wheels that are set in motion for giving an equitable access to datasets. The Open Access movement recognises the key role of datasets for various institution and researchers along with their application in their derivative works. The contribution to this cause is made by Creative Commons, a non- profit organisation that offers three types of licenses compatible with Open Access. These are:

A waiver: CC0 0
Attribution of the original owner: CC-BY v.4.0
Attribution for the datasets and its attributes: CC-BY-SA v.3.0²⁸.

Along with this, there is also a governmental Open Access license known as OGL v.3.0 which is an Open Government License²⁹ under the English jurisdiction.This essentially recognises public sector information and gives acknowledgement to the original owner. The structure of a database is also protected under an ODC-BY v.1.0³⁰ for the cases when a dataset includes a database.

Although, these licenses are not exactly objectively related to a FRAND structure, their presence indicates a popularity among companies to obtain datasets for their perusal. Since, there is a lack of appropriate jurisprudence, it is important for both the technical and judicial experts to work upon a licensing system and take assistance of few existing systems like Open Access licensing. It promotes a culture of forward directed innovation with more transparency and essentially, better opportunities for new entrants.

POTENTIAL CONSEQUENCES

This essay doesn’t leave the discussion at an assessment of the advantages that FRAND agreements with respect to datasets bring into the picture but also makes an informed attempt at ascertaining the effect that it will have on competition in the market, the impact on final consumers and most importantly, privacy.

HOW DOES IT AFFECT THE COMPETITION?

It is not our case that a certain industry be tampered with by neutralising its product. The proposition here is that we have to move past the stage where the trade for data is a sole underlying business model for a corporation, rather promoting a competitive behaviour by the provision of basic essentials i.e., datasets. The abuse of dominant position acquired by tech giants³¹ is evident through the cases filed against them by the US government³² and even the European competition authorities.³³ This compels us to provide equity in the system and limit the compounding effect that translates into unfair dominance. FRAND licensing in databases allows for an instrument that does the job here.

Also, it does not kill the competition in a negative sense since, the essential thing to get ahead of your competition is the analysis of data and not the mere existence of it³⁴. By trading its basic sets as something which everyone should have, we basically aid the companies in obtaining the raw material, making the real competition about who works better on top of it. If certain processes or inclusive products are standardised for digital products, just like their physical counterparts then not only does it raise the standard of the competition but also, prevents a new entrant from making their way through from scratch.

THE CAUSE OF PRIVACY

Another important concern about the sharing of datasets is in relation to a first principle. This is in regards to the privacy of the users to whom the data belongs. It is our case that the material advantage to users is still more in favour of the consumers if their data does not exist solely as a commodity in a free market and thus, the balance, tilts in the favour of the execution of the idea being put forward here. Secondly, privacy in the digital space is in itself is a separate base that needs to be adequately covered with appropriate legislations irrespective of the discussions being done here. In fact, it is our case that the substantial dominance gained by few tech giants is the very reason of such privacy issues arising in the first place.³⁵ The Facebook – Cambridge Analytica case³⁶ is an example of how monopoly over data can lead to breaches that are not even realised until much later. Also, licensing does not come without a surrounding jurisprudence that legally restricts its abuse or misuse. There has to be a categorisation on what kind of a dataset is safe to be traded, depending on the standardised products and services and what shall be the intention for the use of that data. As said, legitimate licensing of necessary datasets provides the pathway to tracking transactions that involves them and provides a legitimate route to license one if needed. Therefore, from the point of view of protecting privacy in the terms of preventing its abuse, FRAND licensing of datasets helps the cause.

DOES THIS BENEFIT THE CONSUMERS?

This section is an analytical summation of the entire essay. The ultimate objective of policies made under commercial laws is to benefit the consumers. Competition law would recognise one of its causes as the enhancement of competition in the market so that there is a provision of choice, incentive for improvement and ease of entry for new players. This is largely what FRAND licenses for datasets provide – better services, more competition and substitutable products. This is particular keeps the tech giants in check and reduces the possibility of a single company abusing its dominant position. The consumers get to express their preference and expect better products given the fact that the competition is not based on acquiring the basic necessities for digital products but on what is made on top of or out of those. It also pushes forward a necessary cause of developing commercial jurisprudence around data and digital products themselves. This is turn arguably provides a moral lens to the consumers and the common man to view the activities of a tech company. Finally, it can be safely said that there is a possibility of a lot of constructive outcomes for the consumers if a FRAND based licensing system is developed for standardized datasets and it demands appropriate legal thought.

CONCLUSION

We have discussed the cause of growing data and its assorted forms that provide a dominant advantage to largely the first movers in the industry. We have also witnessed a substantially decent number of instances where this dominance has been abused and there is a clear possibility of it happening again, with more dire consequences. Therefore, the fundamental requirement for the preparation of digital products and offering of digital services – datasets, must be governed by a fair, reasonable and non-discriminatory form of licensing. This takes away the monopolistic power vested in a company just by the virtue of it being the first mover and reduces the self-formative compounding nature of such dominance. It allows for raising the standards of competition to innovation rather than hoarding information. It would promote tech giants to work on more constructive products rather than just concentrating on acquisition and appropriation of user data and claiming a high price for the same.

We have gone over the possible consequences of such licensing, primarily concentrating on how it will impact competition in the market, deal with privacy issues and ultimately, will it benefit the consumers? Given the analysis made in the former two sections, it has been expressed that the answer to the third is in affirmative. Conclusively, it is imperative to develop a jurisprudence on the subject just by the virtue of its deep impact on the present-day society.

¹ Student at Gujarat National Law University

² OECD,Data-Driven InnovationBig Data for Growth and Well-Being(OECD 2016) 181

³BhraguHaritas, ‘How Godrej is using data analytics to transform critical business functions’ (The Economic Times13 November 2019) <https://cio.economictimes.indiatimes.com/tag/subrata+dey> accessed 16 October 2021, Ali Kidwai, ‘Top 5 Analytics Use Cases in FMCG Industry’ (Polestar 23 November 2020)

<https://www.polestarllp.com/analytics-use-cases-fmcg-industry> accessed 16 October 2021

⁴RGBSI, ‘The Role of Big Data Analytics in Industry 4.0’ (RGBSI) <https://blog.rgbsi.com/big-data-analytics- in-industry-4.0>accessed 16 October 2021

⁵Abhishek TK, ‘Analysis of Competition Law Issues in the Facebook-Jio Deal’ (Enhelion Blogs 24 November 2020) <https://enhelion.com/blogs/2020/11/24/analysis-of-competition-law-issues-in-the-facebook-jio-

deal/#_ftn1>accessed 16 October 2021

⁶Maurice E. Stucke, ‘Here Are All the Reasons It’s a Bad Idea to Let a Few Tech Companies Monopolize Our Data’ (Harvard Business Review 27 March 2018) <https://hbr.org/2018/03/here-are-all-the-reasons-its-a-bad- idea-to-let-a-few-tech-companies-monopolize-our-data>accessed 16 October 2021

⁷Charles Arthur, ‘Tech giants may be huge, but nothing matches big data’ (The Guardian 23 August 2013)<https://www.theguardian.com/technology/2013/aug/23/tech-giants-data> accessed 16 October 2021

⁸ Jenn Cano, ‘The V’s of Big Data: Velocity, Volume, Value, Variety, and Veracity’ (XSNet, 11 March 2014)

<https://www.xsnet.com/blog/bid/205405/>accessed 16 October 2021

⁹ Uwe Rattay, ‘Untersuchung an vierFahrzeugen – WelcheDatenerzeugteinmodernes Auto?’ (ADAC 2016)<https://www.adac.de/infotestrat/>accessed 16October 2021

¹⁰Michael Palmer,‘Data is the New Oil’(ANA3 November 2006)

<https://ana.blogs.com/maestros/2006/11/data_is_the_new.html> accessed 16 October 2021

¹¹Amol Mavuduru, ‘Is Data Really the New Oil in the 21st Century?’ (towards data science 12 December 2020)

<https://towardsdatascience.com/is-data-really-the-new-oil-in-the-21st-century- 17d014811b88#:~:text=Clive%20Humby%2C%20a%20British%20mathematician,took%20this%20concept%2 0even%20further. > accessed 16 October 2021

¹²International Organization for Standardization, ‘Standards’ (International Organization for Standardization)<https://www.iso.org/standards.html>accessed 16 October 2021

¹³Nikieta Aggarwal, ‘What is FRAND Licensing all About: Top 10 Points to Keep in Mind’ (IPleaders15 February 2017) <https://blog.ipleaders.in/frand-licensing/> accessed 16 October 2021

¹⁴Winterberry Group, Data as Competitive Advantage (White Paper, October, 2015)

¹⁵Federal Trade Commission, ‘Monopolization Defined’ (Federal Trade Commission)

<https://www.ftc.gov/tips-advice/competition-guidance/guide-antitrust-laws/single-firm- conduct/monopolization-defined>accessed 16 October 2021

¹⁶Hoffmann-La Roche & Co. AG v Commission of the European Communities [1979] ECR 1979 – 00461 ¹⁷Competition Commission, Provisions relating to Abuse of Dominance (Competition Commission of India 2020)

¹⁸Deutsche Telekom AG v. Commission [2010] C-280/08

¹⁹Sherman Act 1890 s. 2

²⁰Deborah Healey, ‘ABUSE OF DOMINANT POSITION’ (Concurrences, 2018)

<https://www.concurrences.com/en/dictionary/abuse-of-dominant-position-en#references> accessed 16 October 2021

²¹ Jaron Lanier, Who Owns the Future?(Simon & Schuster 2014)

²²European Commission, ‘Antitrust: Commission fines Google €2.42 billion for abusing dominance as search engine by giving illegal advantage to own comparison-shopping service’ (European Union, 27June 2017)

<https://ec.europa.eu/commission/presscorner/detail/en/IP_17_1784>accessed 16 October 2021

²³Subcommittee on Antitrust, Commercial and Administrative Law of the Committee on the Judiciary,

²⁴Robert P. Merges, ‘PROPERTY RIGHTS THEORY AND THE COMMONS: THE CASE OF SCIENTIFIC RESEARCH’[1996] 13 SIPP 145

²⁵ B. Hugenholtz, “Data property: Unwelcome guest in the House of IP”, 2018 https://www.ivir.nl/publicaties/download/Data_property_Muenster.pdf (accessed on October 15, 2018)

²⁶ Your Europe, ‘Database protection’ (European Union, 17 August 2021)<https://europa.eu/youreurope/business/running-business/intellectual-property/database- protection/index_en.htm> accessed 16 October 2021

²⁷CPrime Studios, ‘10 Best Healthcare Data Sets (Examples)’ (CPrime Studios)<https://cprimestudios.com/blog/10-best-healthcare-data-sets-examples> accessed 16 October 2021 ²⁸Creative Commons, ‘Attribution 4.0 International (CC BY 4.0)’ (Creative Commons)

<https://creativecommons.org/licenses/by/4.0/>accessed 16 October 2021

²⁹National Archives UK, ‘Open Government License for public sector information’ (National Archives UK)

<https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/>accessed 16 October 2021

³⁰ Marcel Kohpeib, ‘A Short Introduction to Datasets, Licensing and the Current EU Copyright Reform’ (Intellectual Property Society at the University of Glasgow, 9 July 2018)<https://gu- ips.org/index.php/2018/07/09/a-short-introduction-to-datasets-licensing-and-the-current-eu-copyright-reform- by-dimitra-iordanidou/>accessed 16 October 2021

³¹Tony Romm, Cat Zakrzewski, Rachel Lerman, ‘House investigation faults Amazon, Apple, Facebook and Google for engaging in anti-competitive monopoly tactics’ (The Washington Post 6 October 2020)

<https://www.washingtonpost.com/technology/2020/10/06/amazon-apple-facebook-google-congress/>accessed 16 October 2021

³²Steven Pearlstein, ‘Facebook and Google cases are our last chance to save the economy from monopolization’ (The Washington Post, 18 December 2020) <https://www.washingtonpost.com/business/2020/12/18/google- facebook-antitrust-lawsuit/>accessed 16 October 2021

³³European Commission, ‘Antitrust: Commission sends Statement of Objections to Google on Android operating system and applications’ (European Commission, 20 April 2016)

<https://ec.europa.eu/commission/presscorner/detail/en/IP_16_1492>accessed 16 October 2021, European Commission, ‘Antitrust: Commission takes further steps in investigations alleging Google’s comparison shopping and advertising-related practices breach EU rules*’ (European Commission, 14 July 2016)

<https://ec.europa.eu/commission/presscorner/detail/en/IP_16_2532>accessed 16 October 2021

³⁴Jeremy Goldman, ‘How Companies Like Amazon and Google Turn Data Into a Competitive Advantage — and How You Can Too’ (Inc, 19 March 2018) <https://www.inc.com/jeremy-goldman/how-companies-like-amazon- google-turn-data-into-a-competitive-advantage-how-you-can-too.html>accessed 16 October 2021

³⁵Dan Patterson, ‘Facebook data privacy scandal: A cheatsheet’ (Tech Republic, 30 July 2020)

<https://www.techrepublic.com/article/facebook-data-privacy-scandal-a-cheat-sheet/accessed 16 October 2021 ³⁶Deepta Seetharaman, Kirsten Grind, ‘Facebook’s Lax Data Policies Led to Cambridge Analytica Crisis’ (Wall Street Journal,20 March 2018) <https://www.wsj.com/articles/facebooks-lax-data-policies-led-to-cambridge- analytica-crisis-1521590720?mod=ITP_pageone_0&tesla=y>accessed 16 October 2021