1 Introduction and Definitions
Crowdsourcing is a well-established concept in the scientific community, used for instance by Jeff Howe and Mark Robinson in 2005 to describe how businesses were using the Internet to outsource work to the crowd , but can be dated back up to 1849 (weather prediction in the US). Crowdsourcing has enabled a huge number of new engineering rules and commercial applications. To better define crowdsourcing in the context of network measurements, a seminar was held in Würzburg, Germany 25-26 September 2019 on the topic “Crowdsourced Network and QoE Measurements”. It notably showed the need for releasing a white paper, with the goal of providing a scientific discussion of the terms “crowdsourced network measurements” and “crowdsourced QoE measurements”. It describes relevant use cases for such crowdsourced data and its underlying challenges.
The outcome of the seminar is the white paper , which is – to our knowledge – the first document covering the topic of crowdsourced network and QoE measurements. This document serves as a basis for differentiation and a consistent view from different perspectives on crowdsourced network measurements, with the goal of providing a commonly accepted definition in the community. The scope is focused on the context of mobile and fixed network operators, but also on measurements of different layers (network, application, user layer). In addition, the white paper shows the value of crowdsourcing for selected use cases, e.g., to improve QoE, or address regulatory issues. Finally, the major challenges and issues for researchers and practitioners are highlighted.
This article now summarizes the current state of the art in crowdsourcing research and lays down the foundation for the definition of crowdsourcing in the context of network and QoE measurements as provided in . One important effort is first to properly define the various elements of crowdsourcing.
The word crowdsourcing itself is a mix of the crowd and the traditional outsourcing work-commissioning model. Since the publication of , the research community has been struggling to find a definition of the term crowdsourcing [3,4,5] that fits the wide variety of its applications and new developments. For example, in ITU-T P.912, crowdsourcing has been defined as:
Crowdsourcing consists of obtaining the needed service by a large group of people, most probably an on-line community.
The above definition has been written with the main purpose of collecting subjective feedback from users. For the purpose of this white paper focused on network measurements, it is required to clarify this definition. In the following, the term crowdsourcing will be defined as follows:
Crowdsourcing is an action by an initiator who outsources tasks to a crowd of participants to achieve a certain goal.
The following terms are further defined to clarify the above definition:
A crowdsourcing action is part of a campaign that includes processes such as campaign design and methodology definition, data capturing and storage, and data analysis.
The initiator of a crowdsourcing action can be a company, an agency (e.g., a regulator), a research institute or an individual.
Crowdsourcing participants (also “workers” or “users”) work on the tasks set up by the initiator. They are third parties with respect to the initiator, and they must be human.
The goal of a crowdsourcing action is its main purpose from the initiator’s perspective.
The goals of a crowdsourcing action can be manifold and may include, for example:
- Gathering subjective feedback from users about an application (e.g., ranks expressing the experience of users when using an application)
- Leveraging existing capacities (e.g., storage, computing, etc.) offered by companies or individual users to perform some tasks
- Leveraging cognitive efforts of humans for problem-solving in a scientific context.
In general, an initiator adopts a crowdsourcing approach to remedy a lack of resources (e.g., running a large-scale computation by using the resources of a large number of users to overcome its own limitations) or to broaden a test basis much further than classical opinion polls. Crowdsourcing thus covers a wide range of actions with various degrees of involvement by the participants.
In crowdsourcing, there are various methods of identifying, selecting, receiving, and retributing users contributing to a crowdsourcing initiative and related services. Individuals or organizations obtain goods and/or services in many different ways from a large, relatively open and often rapidly-evolving group of crowdsourcing participants (also called users). The use of goods or information obtained by crowdsourcing to achieve a cumulative result can also depend on the type of task, the collected goods or information and final goal of the crowdsourcing task.
1.2 Roles and Actors
Given the above definitions, the actors involved in a crowdsourcing action are the initiator and the participants. The role of the initiator is to design and initiate the crowdsourcing action, distribute the required resources to the participants (e.g., a piece of software or the task instructions, assign tasks to the participants or start an open call to a larger group), and finally to collect, process and evaluate the results of the crowdsourcing action.
The role of participants depends on their degree of contribution or involvement. In general, their role is described as follows. At least, they offer their resources to the initiator, e.g., time, ideas, or computation resources. In higher levels of contributions, participants might run or perform the tasks assigned by the initiator, and (optionally) report the results to the initiator.
Finally, the relationships between the initiator and the participants are governed by policies specifying the contextual aspects of the crowdsourcing action such as security and confidentiality, and any interest or business aspects specifying how the participants are remunerated, rewarded or incentivized for their participation in the crowdsourcing action.
2 Crowdsourcing in the Context of Network Measurements
The above model considers crowdsourcing at large. In this section, we analyse crowdsourcing for network measurements, which creates crowd data. This exemplifies the broader definitions introduced above, even if the scope is more restricted but with strong contextual aspects like security and confidentiality rules.
2.1 Definition: Crowdsourced Network Measurements
Crowdsourcing enables a distributed and scalable approach to perform network measurements. It can reach a large number of end-users all over the world. This clearly surpasses the traditional measurement campaigns launched by network operators or regulatory agencies able to reach only a limited sample of users. Primarily, crowd data may be used for the purpose of evaluating QoS, that is, network performance measurements. Crowdsourcing may however also be relevant for evaluating QoE, as it may involve asking users for their experience – depending on the type of campaign.
With regard to the previous section and the special aspects of network measurements, crowdsourced network measurements/crowd data are defined as follows, based on the previous, general definition of crowdsourcing introduced above:
Crowdsourced network measurements are actions by an initiator who outsources tasks to a crowd of participants to achieve the goal of gathering network measurement-related data.
Crowd data is the data that is generated in the context of crowdsourced network measurement actions.
The format of the crowd data is specified by the initiator and depends on the type of crowdsourcing action. For instance, crowd data can be the results of large scale computation experiments, analytics, measurement data, etc. In addition, the semantic interpretation of crowd data is under the responsibility of the initiator. The participants cannot interpret the crowd data, which must be thoroughly processed by the initiator to reach the objective of the crowdsourcing action.
We consider in this paper the contribution of human participants only. Distributed measurement actions solely made by robots, IoT devices or automated probes are excluded. Additionally, we require that participants consent to contribute to the crowdsourcing action. This consent might, however, vary from actively fulfilling dedicated task instructions provided by the initiator to merely accepting terms of services that include the option of analysing usage artefacts generated while interacting with a service.
It follows that in the present document, it is assumed that measurements via crowdsourcing (namely, crowd data) are performed by human participants aware of the fact that they are participating in a crowdsourcing campaign. Once clearly stated, more details need to be provided about the slightly adapted roles of the actors and their relationships in a crowdsourcing initiative in the context of network measurements.
2.2 Active and Passive Measurements
For a better classification of crowdsourced network measurements, it is important to differentiate between active and passive measurements. Similar to the current working definition within the ITU-T Study Group 12 work item “E.CrowdESFB” (Crowdsourcing Approach for the assessment of end-to-end QoS in Fixed Broadband and Mobile Networks), the following definitions are made:
Active measurements create artificial traffic to generate crowd data.
Passive measurements do not create artificial traffic, but measure crowd data that is generated by the participant.
For example, a typical case of an active measurement is a speed test that generates artificial traffic against a test server in order to estimate bandwidth or QoS. A passive measurement instead may be realized by fetching cellular information from a mobile device, which has been collected without additional data generation.
2.3 Roles of the Actors
Participants have to commit to participation in the crowdsourcing measurements. The level of contribution can vary depending on the corresponding effort or level of engagement. The simplest action is to subscribe to or install a specific application, which collects data through measurements as part of its functioning – often in the background and not as part of the core functionality provided to the user. A more complex task-driven engagement requires a more important cognitive effort, such as providing subjective feedback on the performance or quality of certain Internet services. Hence, one must differentiate between participant-initiated measurements and automated measurements:
Participant-initiated measurements require the participant to initiate the measurement. The measurement data are typically provided to the participant.
Automated measurements can be performed without the need for the participant to initiate them. They are typically performed in the background.
A participant can thus be a user or a worker. The distinction depends on the main focus of the person doing the contribution and his/her engagement:
A crowdsourcing user is providing crowd data as the side effect of another activity, in the context of passive, automated measurements.
A crowdsourcing worker is providing crowd data as a consequence of his/her engagement when performing specific tasks, in the context of active, participant-initiated measurements.
The term “users” should, therefore, be used when the crowdsourced activity is not the main focus of engagement, but comes as a side effect of another activity – for example, when using a web browsing application which collects measurements in the background, which is a passive, automated measurement.
“Workers” are involved when the crowdsourced activity is the main driver of engagement, for example, when the worker is paid to perform specific tasks and is performing an active, participant-initiated measurement. Note that in some cases, workers can also be incentivized to provide passive measurement data (e.g. with applications collecting data in the background if not actively used).
In general, workers are paid on the basis of clear guidelines for their specific crowdsourcing activity, whereas users provide their contribution on the basis of a more ambiguous, indirect engagement, such as via the utilization of a particular service provided by the beneficiary of the crowdsourcing results, or a third-party crowd provider. Regardless of the participants’ level of engagement, the data resulting from the crowdsourcing measurement action is reported back to the initiator.
The initiator of the crowdsourcing measurement action often has to design a crowdsourcing measurement campaign, recruit the participants (selectively or openly), provide them with the necessary means (e.g. infrastructure and/or software) to run their action, provide the required (backend) infrastructure and software tools to the participants to run the action, collect, process and analyse the information, and possibly publish the results.
2.4 Dimensions of Crowdsourced Network Measurements
In light of the previous section, there are multiple dimensions to consider for crowdsourcing in the context of network measurements. A preliminary list of dimensions includes:
- Level of subjectivity (subjective vs. objective measurements) in the crowd data
- Level of engagement of the participant (participant-initiated or background) or their cognitive effort, and awareness (consciousness) of the measurement level of traffic generation (active vs. passive)
- Type and level of incentives (attractiveness/appeal, paid or unpaid)
Besides these key dimensions, there are other features which are relevant in characterizing a crowdsourced network measurement activity. These include scale, cost, and value; the type of data collected; the goal or the intention, i.e. the intention of the user (based on incentives) versus the intention of the crowdsourcing initiator of the resulting output.
In Figure 1, we have illustrated some dimensions of network measurements based on crowdsourcing. Only the subjectivity, engagement and incentives dimension are displayed, on an arbitrary scale. The objective of this figure is to show that an initiator has a wide range of combinations for crowdsourcing action. The success of a measurement action with regard to an objective (number of participants, relevance of the results, etc.) is multifactorial. As an example, action 1 may indicate QoE measurements from a limited number of participants and action 2 visualizes the dimensions for network measurements by involving a large number of participants.
The attendees of the Würzburg seminar on “Crowdsourced Network and QoE Measurements” have produced a white paper, which defines terms in the context of crowdsourcing for network and QoE measurements, lists of relevant use cases from the perspective of different stakeholders, and discusses the challenges associated with designing crowdsourcing campaigns, analyzing, and interpreting the data. The goal of the white paper is to provide definitions to be commonly accepted by the community and to summarize the most important use-cases and challenges from industrial and academic perspectives.
 White Paper on Crowdsourced Network and QoE Measurements – Definitions, Use Cases and Challenges (2020). Tobias Hoßfeld and Stefan Wunderer, eds., Würzburg, Germany, March 2020. doi: 10.25972/OPUS-20232.