We note as well that models of information diffusion are based on classical epidemiological models, e.g. SIR. In recent months, these models have been used extensively to model the spread of COVID. Hence, understanding and modelling viral processes has become a key research direction. The models that describe viral processes usually assume that this process is stochastic, e.g. the famous SIR model. While it may be that this model is correct for the case for which it was created, i.e. to describe the process of disease spread, as we showed in our previous work it does not apply to the case of information spreading. First, this model does not take into account that information becomes less up-to-date over time and people are sharing it less actively. Secondly, information is spread through different channels.
If we want to study information dissemination on theTwitter network, we need to take into account other means as well, e.g. mass media. In particular, the lack of these effects causes the SIR model to overestimate the likelihood that information will become viral, i.e. reach almost the whole network. Our work (HT 2016) explains the experimentally observed cascade sizes by incorporating two effects:
This paper introduces the concept of direction of information spread, i.e. from high-degree and high-trust nodes. The motivation for this assumption is the fact people are more likely to share information coming from high-degree nodes. In other words, it seems that we are actually far from understanding well the mechanism of information spread in social networks. This is despite the fact that in social networks, viral processes can be very accurately traced. Our lack of understanding implies that we are unable to correctly assess the risks related to very rare events. In particular, to our knowledge, our paper (HT 2016) is the only case where a metric that correctly accounts for rare events is used. This raises a question of whether in the epidemiological applications of such models, rare events such as pandemic spread of diseases are correctly described. Another line of research of viral processes is the prediction of how popular given information will become.
It should be noted that these models are created based on a completely different approach than assumed in our works. The typical approach is to build a regression model based on the observed process characteristics that predicts their further evolution. However, these models have limited effectiveness because they indirectly assume that the process is deterministic, and it is known that it has a stochastic nature and its evolution is not predetermined. This creates a challenge to develop models that would predict all possible continuations of its evolution described as a distribution. In particular, only this type of approach can lead to statistically valid results that would predict the chances that a process reaches the whole network. However, there are more important challenges here which form topics for the tasks of this project: