In social networks with multiple races, ethnic or religious groups involved it is generally the case that there are fewer links between groups and more within groups than would be expected from uniform random matching. One piece of research exploring this is Currarini, Jackson, Pin (2009).
When observing fewer intergroup links than equal-probability matching predicts, the natural question is who discriminates whom. If group A and group B don’t form links, then is it because group A does not want to link to B or because B does not link to A? If we observe more couples where the man is white and the woman is Asian than expected from uniform random matching, is this due to the `yellow fever’ of white men or a preference of Asian women for white men? It could also be caused by white men and Asian women meeting more frequently than other groups, but this particular kind of biased matching seems unlikely.
Assume both sides’ consent is needed for a link to form. Then the probability that a member of A and a member of B form a link is the product of the probabilities of A accepting B and B accepting A. We can interpret these probabilities as the preference of A for B and B for A and say that if the preference of A for A is stronger than the preference of A for B, then A discriminates against B. From data on undirected links alone, only the product of the probabilities can be calculated, not the separate probabilities. So based only on this data it is impossible to tell who discriminates whom.
If there are more than two groups in the society, then for each pair of groups the same problem occurs. Under the additional assumption that a person treats all other groups the same, only his own group possibly differently from the other groups, the preference of each group for each group can be calculated. This assumption is unlikely to hold in practice though.
If only one side’s consent is needed for a link to form, then from data on these directed links, the preference of each group for each group can again be calculated. The preference of A for B is just the fraction of A’s links that are to B, divided by the fraction of B in the population.
With additional data on who initiated a link or how much effort each side is putting into a link, the preference parameters may be identifiable. The online dating website OKCupid has some statistics on how likely each race is to initiate contact with each other race and how likely each race is to respond to an initial message by another race. If these statistics covered the whole population, then it would be easy to calculate who discriminates whom. In the case of a dating website however, the set of people using it is unlikely to be a representative sample of the population. This may change the results in a major way.
If the average attractiveness of group A in just the dating website (not in the whole population) is higher than that of other groups, then group A is likely to receive more initial contact attempts just because they are attractive. They can also afford to respond to fewer contact attempts since, being attractive, they can be pickier and make less effort to form links. If we disregard the nonrepresentative sample problem and just calculate the preferences of all groups for all other groups, then all groups will be found discriminating in favour of group A, and group A will be found discriminating against all others. But in the general population this may not be the case.
The attractiveness of group A in the dating website can differ from their average attractiveness if the website is more popular with group A and there is adverse selection into using the website. Adverse selection here means that only the people sufficiently unattractive to find a match by chance during their everyday life make the extra effort of starting to use the website to look for matches. So the average attractiveness of all groups using the website is lower than the population’s average attractiveness.
If a larger fraction of group A prefers to use the website and the users from all groups are drawn from the bottom end of the attractiveness distribution, then the website is relatively more popular with attractive members of A than with attractive members of other groups. Therefore the average attractiveness of those members of A using the website is higher than the average attractiveness of those members of other groups using the website. The higher preference of group A for using the website must be exogeneous, i.e. due to something other than A’s lower average attractiveness, otherwise this preference does not cause A’s attractiveness on the website to rise. It could be that members of A are more familiar with the internet, so have a lower effort cost of using any website. Or there may be a social stigma against using online dating sites, which could be smaller in group A than in other groups.
If statistics from a nonrandom sample show discrimination, there may or may not be actual discrimination in the population, depending on the bias of the sample. It could also be that the actual discrimination is larger than the sample shows, if the sample bias goes in the opposite direction from the one described above.