Crowdsourcing applications addressing diseases and public health: a perspective on COVID-19 infestation

The spread of the COVID-19 disease with an unprecedented speed into humans, and the global scale of its occurrence over multiple geographic locations, since December 2019, in Wuhan, China, has sparked off extensive confusion and debate in public health, giving it the status of a pandemic. The inability of restraining the outbreak in the early stages, has multiplied the disease risk to fatal complications. Crowdsourcing technique can conglomerate crowd knowledge for solving problems revolutionizing health care by use of internet sources, data mining, e-health trackers, etc. to collect and assess data faster to the rate of spread of infection, directly from a point source (individual-level). The present study provides perspectives on crowdsourcing in alignment with health care and public health services by critically comparing strengths and challenges with traditional methods. For the same 3 models have been designed by the authors, for improvement in public health care, in the wake of the COVID-19 infestation.


Crowdsourcing and the pandemic (COVID-19)
The COVID-19 disease was first officially declared by WHO1 on December 31st, 2019, citing reference of occurrences of an influenza-like disease in the Hubei province of Wuhan city in China [1].According to data on sequenced genomes of the virus, phylogenetic analysis revealed that its recent common ancestor SARS-CoV-2 first occurred in 2019 during the months of October-December2, from whence the disease catapulted by an alarming rate showing widespread infestation with a current worldwide number of >19.7 crore people reported by WHO as confirmed cases [2,3].
From a focus on public health, with the spread of a pandemic disease like the COVID-19 infestation, the first strategy for disease control and surveillance lies in collecting a pool of data on the number of confirmed cases, probable cases and suspected cases to formulate an indispensable data list called the line list, which would be efficient in providing a quick assessment of the growth of the pandemic

Critical assessment of traditional and crowdsourcing processes -advantages and challenges
The COVID-19 pandemic has spread across all quarters of the globe within an unprecedented time scale, since its emergence in December of 2019 [1].
Even after 1.5 years ranging from December, 2019 -July, 2021, the disease spread is yet to be controlled and arrested from further proliferation, for which crowdsourcing as a measure has been discussed.
Apart from the advantages it offers, the challenges in crowdsourcing also need to be addressed that include differing opinions, behaviors, reporting styles and health-seeking attitudes of individuals across different geographical locations, age-groups and time even though most of the present-day crowdsourcing systems take all these factors in consideration [20].Such systems if acknowledged and implemented globally at crucial times like the COVID-19 pandemic will not only address and control the pandemic but would also generate sufficient data for future researchers to study the epidemiological nature of a pandemic with greater precision [21].Recent researches have indicated how traditional data collection systems on disease surveillance face certain limitations like financial barriers, timeliness, uneven selection of population and contributor biases which may lead to the larger outbreak of infectious disease and transmissibility in the population to an alarming level [22].
Even the WHO claims that most of the reports and field data is limited to and obtained from organizations and technical bodies credible to contribute in international response or outbreak [23].Crowdsourcing, by tapping data from a larger crowd "big crowd" of people comprising of internet users, non-experts, diseased individuals, families of diseased individuals, experts, academicians, researchers etc. can help address the gap of procuring information at individual-level sources, whilst decreasing time of data generation to disease outbreak [24].
Active public engagement, awareness generation, and health education of the general public are other subsidiary advantages of crowdsourcing as suggested by anecdotal shreds of evidence [10].
The measurement of disease spreading and public health response data can be accurately tracked down to strategize health interventions faster.
Crowdsourcing tools also can spatially multiply data in locations that are not covered by traditional surveillance plots [25,26].Several disease dynamic factors like social environmental impact, contact patterns on an infectious disease can also be learned through crowdsourced /data in comparison to traditional systems [27].
However, crowdsourced data also remains subject to certain limitations namely, detection rate, data credibility, limited reach to areas without internet coverage, lower specificity, demographic biases, false alarms, which recent developments like emergency-room-crowding, alignment of crowdsourced data with diagnostic designs, clinical alignment etc. are trying to reduce [28][29][30].
Case scenarios -application of crowdsourcing in health care and disease surveillance   interventions [47].In India, the Aarogya Setu mobile app has been launched by the government for enhancing the ease of connectivity with health services [48].

Web search logs:
In the previous year Google-Flu-Trends as an approach to crowdsourced data has been appraised by several researchers for forecasting influenza-like-illness based cases in the population [49].The use of web search logs was further validated by the research of Wiwanitkit, where the logistic regression models showed better performance when data from Google-Trends were put in the place of predictive variables [50].Through the parameters of cost, data size, designing difficulty, assessment difficulty and participative nature of the crowd, an interrelation between active and passive crowdsourcing has been drawn.Active crowdsourced data requires the involvement of professional human resources (designers, artists, computational biologists, computer engineers, etc.) in designing the tools required for tapping crowd data, and is a cost expensive process; whereas passive crowdsourcing is dependent on social media sources and data mining for procurement of data, which often incurs a less expensive method of data extraction.Also since passive crowdsourced data is openly accessible to the social media populace, the size of the data mined is larger due to more people participation, unlike the limited data size and participation rate in active crowdsourced data.Also due to enhanced design difficulties the process of result assessment is rigorous in active crowdsourcing and it requires the involvement of talented and more human resources, unlike that in passive crowdsourcing.All in the endeavour of formulating disease intervention policies for public health and public welfare.Data collected directly from citizens (individual-level) would help the government and policy understand their opinions and knowledge on COVID-19 infestation, at a real-time basis that will be faster than or similar at pace to the spread of viral infection.A faster data collection, would help to amend current policies, improvise or formulate new policies in public health which would provide faster relief and help to arrest the pandemic efficiently.Also by the use of emerging technologies in crowdsourcing in form of automation web searches via APIs (Application-Programming-Interfaces), both active and passive crowdsourced information can be procured on time.Finally the government procured crowdsourced information can be transferred in real-time to relevant municipal bodies, local bodies, service providers, healthcare bodies, R&D units and educational interfaces, to take fast action on public health.

Perspectives and discussion
Furthermore, through this health care professionals and hospitals can be equipped to deliver patientcentric care and person-centric care to the patients as crowdsourcing approaches tap data directly from individual patient-level sources designing care principles not only for, but with the involvement of patients and the general public [51].Ultimately, this framework would provide a sustainable outlook on healthcare provision and disease surveillance.Also huge gaps in the quality of the collected metadata exist in several fields namely health, molecular biology, genomics, virology and this is a challenge that remains to be addressed [53].This can be solved by making necessary provisions available to the respondent crowd like providing them with the facility of extracting validated articles for free from registered journals or government reports, such that they can refer and gain knowledge from the same before their knowledge contribution.

Future and
In the present effective models have been designed, aligning the concepts of computational biology and healthcare to address the issues of the COVID-19 pandemic focusing on healthcare intervention and disease surveillance.It has been explained in detail how crowdsourcing can serve as the ultimate system for government bodies, local and healthcare bodies, research units and education institutions alike by running in realtime to provide individual-level data that can not only provide patient-centric and person-centric care to infected patients but also aid the government in formulating health intervention policies.
in figuring out the potential rate of spread which the pandemic holds and in providing sufficient evidence on isolation and quarantine period to help in monitoring detected cases.The second strategy requires refreshing the former line list with a fresh pool of data on a real-time basis as more news and knowledge on epidemiology, virology and clinical cases would surface with the disease's progression.In this regard, the work of Kaiyuan Sun et al., holds great value as the researchers harnessed real-time data from social media platforms, which aligned closely with data of Chinese CDC [5].By focusing on the social network profiles of medical professionals they filtered credible data into a crowdsourced line list, for creating a compiled data list on susceptible patients from a point source (individual source), which they calculated on daily basis at a provincial level during January of 2020.Crowdsourcing acts as a tool of real-time data collection directly from the point source of diseased individuals or suspected cases [6].For predicting the future spread of an infectious disease like COVID-19, knowledge of disease dynamics is crucial and spatiotemporal model on disease surveillance data helps to understand processes and patterns of disease spread.Often the data is acquired from the hospitals, environmental or government census sources, which though are validated and robust sources, possess certain other limitations namely contributor oriented biases, latency in reports, unspecific demographic resolution and high-cost factors [7, 8].Time is an important factor while dealing with the spread and control of infectious diseases like COVID-19 and procurement of real-time data at a faster pace is always beneficial for the population, as it helps to generate strategies for disease control and surveillance [9].By evading infrastructure costs or regulations, crowdsourcing increases data mobility and speed thereby decreasing time.It generates real-time data at a speed that can match the speed of COVID-19 disease spread or can often generate data faster than the serial time interval of infectious diseases [10, 11].Hence, crowdsourcing actively helps to arrest the gaps in data collected through conventional means and sources.Crowdsourcing techniques in the era of big data includes participation based internet-surveillance systems for diseases, where disease symptoms are reported by actively participating individuals who voluntarily submit their data through web portals, mobile health trackers, mobile apps, tweets or emails generating a huge pool of real-time data within a short period [12].This was successfully conducted for the influenza outbreak in 2009, and the same if applied in the case of COVID-19 infestation can provide several advantages like increased speed, data size, research utility, data validation and accessibility over traditional disease surveillance measures in public health [13-17].Other computational online services like the HealthMap used in Canada hosted by Harvard University, aids in intelligent data synthesis of disease outbreak data sources and it can hence provide a multitude of unstructured or structured reports for tracking down COVID-19 outbreaks in the population [18].Yet another advantage that crowdsourcing tools offer is the augmented engagement of the local populace increasing their awareness and involvement about the pandemic, thereby acting as a pathway of health care education for the public [19].The government if supplied with real-time data at a faster time-space can develop health interventions and strategies for preventing the upsurge of the disease in future.Also, such future developments can be conducted in parallel and without negating data collected from official and traditional sources at par with the confidentiality allowed.
Mallick D. et al: Crowdsourcing applications addressing diseases International Journal of Medical Research and Review 2021;9(4) In the present prospective study, a critical assessment of credible literary sources, has been conducted at first, focusing on crowdsourcing techniques and tools for surveillance of disease and collection of health care data.Following this, several perspectives have been drawn on crowdsourcing as a potential system to aid in better and faster surveillance of COVID-19 infestation, by creating models for the same.Finally the developed models have been discussed alongside their limitations, followed by a conclusion where the opinions of authors have critically addressed the potential of crowdsourcing in the arrest of COVID-19 infestation.Future scopes of crowdsourcing systems have also been discussed for providing knowledge to researchers, academicians and government bodies alike to design strategies through which faster disease intervention measures can be developed during similar pandemic situations in the future.
According to the work of Fourati and colleagues, published in the Nature Communications journal, crowdsourcing can serve as a potent tool in the prediction of viral-infection based disease susceptibility in humans through predictive analytics [31].The researchers conducted a communitybased assessment for predicting physiologic responses (symptomatic) through the identification of molecular predictors, in humans exposed to either one of H1N1, RSV, Rhinovirus and H3N2 viruses.Mallick D. et al: Crowdsourcing applications addressing diseases International Journal of Medical Research and Review 2021;9(4) The study was made possible due to the collection of a large data pool through crowdsourced processes within a short period and aided in exploring responses of humans to respiratory virus even before viral exposure [31].Another successful disease case research developed on the base of crowdsourcing systems was in the surveillance of influenza (infectious disease) by the CDC, US (Centers for Disease Control and Prevention, United States) that served as the core metric to measure influenza activity nationally, as depicted in Figure 1 a. and Figure 1 b.[32].

Figure
Figure 1 a.Weekly visits (average) to the sentinel CDC sites in percentage, from 2000-2011 seasons, with holiday weeks shaded in graySource:[32]

Model 1 : 01 .
The applicability of crowdsourcing approaches in improving public health.The above model (Model 1), designed to the computational biology with that of health care research.In the first stage, data will be collected on a real-time basis from the "big crowd" comprising of individual sources like general people, internet users, experts and non-experts, via active and passive crowdsourcing tools like challenge contests, disease based e-surveys, scientific games or labor markets (Amazon M Turk).From this stage, one will move into the second stage of the process, namely data procurement.The huge pool of data at this stage contains credible and non-credible information which needs to be filtered and screened by analysts to be developed into a line list.In the third stage the credible data pool will be analyzed and processed information will aid in solving several problems about biomedicine, health care, molecular-genomics, infectious disease seasonality identification, disease pattern identification, COVID-19 spread and formulation of suspected, probable and affected cases of individuals in real-time basis specific to a particular environmental arena.The fourth stage depicts how the solutions analyzed, will serve as valuable realtime data for conducting health care research aimed at improving public health.In the perspective of COVID-19, the data can help fuel the work of other researchers, scientists, health care staff and service providers to fill knowledge vacuums in medical research, in designing medicines, in identifying disease patterns and seasonality to curb false rumours, crowd confusion and fear.This would also aid in faster and timely generation of vaccines and in the formulation of even better health care interventions.It would aid scientists to work on the present gaps in the system by developing enhanced algorithms and automated bots for conducting faster web search via the processes of data mining Mallick D. et al: Crowdsourcing applications addressing diseases Applicability Parameter interrelation in passive crowdsourcing and active crowdsourcing Through the above model (Model 2), the authors opine the interrelation amongst several parameters of active crowdsourced data and passive crowdsourced data.Whist active crowdsourced data is collected from sources namely labor markets (Amazon M Turk, CrowdFlower, etc.), citizen challenges, online e-surveys, scientific games through active voluntary participation of the crowd, passive crowdsourced data is procured via data mining and social networking by tapping social media sources from the crowd who are passively and unconsciously participating in the generation of relevant information.

3.
Government role in public health care policy via crowdsourcing Model 3. Improvising public health care policy through crowdsourcing platforms From the above model (Model 3), the authors opine that data obtained through crowdsourcing platform can be utilized by the government to conduct experiments on public health status, rate of spread of infection, determine seasonality of viral infection during pandemics etc., Mallick D. et al: Crowdsourcing applications addressing diseases International Journal of Medical Research and Review 2021;9(4) The authors have opined 3 perspectives on the concept of utilizing crowdsourcing as a government intervention tool.A detailed search of credible databases (PubMed, Embase) and search logs by using MeSH terms have been conducted on recent researches existing over the last 10 years, by both the authors.The articles and researches finally selected, have then been critically assessed and analyzed by authors on respective subjects to develop and throw new light on the concept.The authors have also successfully drawn and designed 3 models on crowdsourcing, depicting future scopes on how the government in coalition with crowdsourcing platforms can effectively shape and improvise public health policy in the wake of a pandemic.