Friday, October 15, 2010

Eurogenes 500,000 SNP BGA Project

I've decided to advertise Polako's project since I've joined it. Polako aka David W. - a well-known figure of the anthro-related Internet - aims at "analyzing your ancestry in the same way that scientists analyze scientific samples in major peer-reviewed studies". See here for more information.

Eurogenes 500,000 SNP BGA Project


On Polako's plots, I'm FR1 (about my background : "My genetic results").

See : A map of Europe (+ Caucasus)



NB : The French Basque sample is extracted from
the "Fondation Jean Dausset", it's also being used by private company 23andme, I've tried to know more about it but I was left unanswered. I suppose that people from Lower Navarre were sampled but that's just my feeling.


Polako is also running K=6 analyzes but he has not officially published his results though on his last West Eurasian run (that excluded the French Basques and the Sardinians who somehow distorted the algorithmic results), my results were the following ones :

FR1

0% Caucasus

36.7% Mediterranean (peaking amongst Med people)
63.3% Western/Northwestern Europe (peaking amongst Orcadians)
0% Middle East
0% East Europe/Central Asia
0% North Africa (peaking amongst Mozabites)



NB : Those results are to be taken with a pinch of salt as defined clusters depend on samples. In other runs, the French Basques were the basis for one cluster (80-99%) on which I scored 70% (French people being around 45%), the remainder of my variation showing affinities with the cluster peaking with Orcadians. On another run with Sardinians included, the Sardinians defined one cluster on which the French Basques scord about 60%. It's eventually just about algorithmic reduction : my own results are very fluctuant depending on which samples are used.



Polako is eventually running very interesting genome-wide comparisons that remain unpublished for the moment (he provides his project's participants with exclusive Excel sheets). Here are my top-10 results :

380K

1. French_Basque 0,750925
2. French_Basque 0,750724
3. French_Basque 0,750425
4. French_Basque 0,749576
5. French_Basque 0,749552
6. French_Basque 0,749528
7. French_Basque 0,749496
8. ES 0,749184
9. French_Basque 0,749165
10. French_Basque 0,749062


520K

1. French_Basque 0,746296
2. French_Basque 0,745259
3. French_Basque 0,744878
4. French_Basque 0,744197
5. French_Basque 0,744027
6. French_Basque 0,743888
7. French_Basque 0,743877
8. French_Basque 0,743755
9. IE3 0,74369
10. ES 0,743607


NB : - ES is an unindentified Spanish individual.
- IE3 is an Irish American.



Stay tuned and visit Polako's weblogs as he updates them.

10 comments:

  1. I'm no fan of Polako but he's knowledgeable admittedly.

    Whatever the case, what I wanted to mention is two things:

    1. The first calculation, without Basque reference, shows what happens when the right reference samples are not available: the affinity diverges towards other more or less similar samples and becomes blurrier. This happens also a lot in Structure-like analysis when not enough depth is achieved, sometimes suggesting "admixture" where there is only shallowness of analysis.

    2. I'm quite favorably surprised with your scoring of c. 70% Basque affinity, because that was approximately what I guessed for Bearnois people visually, if you recall. What seems to tell me that my subjective way of estimating affinity is not so far-fetched after all.

    ReplyDelete
  2. I had feuds with Polako (both about genetics and politics) but he's indeed a pretty knowledgeable guy and a commited one.

    Here are my results according to the many runs he made.

    - K=6 (with Sardinians)
    Clusters were labelled according to the highest matching population.

    FR1

    ME/Bedouin - 0%
    Northern European - 44,18%
    ME/Palestinian - 0%
    SW European - 54.93%
    ME/Druze - 0.87%
    North African - 0%

    FR2

    ME/Bedouin - 1.1%
    Northern European - 57.1%
    ME/Palestinian - 4.5%
    SW European - 34.6%
    ME/Druze - 2.4%
    North African - 0%


    French Basque (I'm using the most archetypal one)

    ME/Bedouin - 0%
    Northern European - 37%
    ME/Palestinian - 0%
    SW European - 63%
    ME/Druze - 0%
    North African - 0%

    Sardinian (I use the second one)

    ME/Bedouin - 0%
    Northern European - 0%
    ME/Palestinian - 5%
    SW European - 95%
    ME/Druze - 0%
    North African - 0%


    - K=6 (without Sardinians)
    Clusters were labelled according to the highest matching population.

    FR1

    2.4% East Mediterranean
    68% Atlanto-Mediterranean
    3.7% Middle Eastern
    24.2% North European
    1.1% East European
    0% North African


    FR2

    12% East Mediterranean
    29% Atlanto-Mediterranean
    4.8% Middle Eastern
    52% North European
    0.5% East European
    1% North African


    French Basque (I'm using the most archetypal one)

    0% East Mediterranean
    99% Atlanto-Mediterranean
    0% Middle Eastern
    0% North European
    0% East European
    0% North African

    ReplyDelete
  3. Conclusion :

    Let's see the results on FR2, an average Frenchman I presume.


    K=6 (with Sardinians)

    ME/Bedouin - 1.1%
    Northern European - 57.1%
    ME/Palestinian - 4.5%
    SW European - 34.6%
    ME/Druze - 2.4%
    North African - 0%


    K=6 (without Sardinians)

    12% East Mediterranean
    29% Atlanto-Mediterranean
    4.8% Middle Eastern
    52% North European
    0.5% East European
    1% North African

    Results are pretty similar as far as European components can be deduced. In the first run, SW Europe was defined by the Sardinians. FR2 is about 35%. In the second run, Atlanto-Med was defined as the French Basques. FR2 is about 29%. From FR2's point of view, either defined by the Sardinians or the Basques, that component is rather identical.

    Let's see my results :

    K=6 (with Sardinians)

    ME/Bedouin - 0%
    Northern European - 44,18%
    ME/Palestinian - 0%
    SW European - 54.93%
    ME/Druze - 0.87%
    North African - 0%

    K=6 (without Sardinians)

    2.4% East Mediterranean
    68% Atlanto-Mediterranean
    3.7% Middle Eastern
    24.2% North European
    1.1% East European
    0% North African

    In that case, the results begin to differ : in the first run, I'm about 55% SW European with SW European being defined by the Sardinians. In the second run, I'm 68% Atlanto-Med with Atlanto-Med being defined as the French Basques. What can we say ? That FR2 was more insensitive than I am to the expulsion of the Sardinians. Conversely, my NE results are strikingly reduced which must mean that the French Basque sample contains genetic variation that was assigned to NE in the first run, to Atlanto-Med in the second run.

    Let's see one French Basque sample.

    K=6 (with Sardinians)

    ME/Bedouin - 0%
    Northern European - 37%
    ME/Palestinian - 0%
    SW European - 63%
    ME/Druze - 0%
    North African - 0%

    K=6 (without Sardinians)

    0% East Mediterranean
    99% Atlanto-Mediterranean
    0% Middle Eastern
    0% North European
    0% East European
    0% North African

    With these results, it's quite obvious that the Sardinian cluster in the first run did not amount for the Basque one in the second run. Actually, the difference is about 37%, ie the NE cluster from the first run now assigned to Atlanto-Med. Such changes are insensitive for people rather distinct from these people but as soon as you tend towards them, your results are distorted.

    As a conclusion, Polako still hasn't found the proper reference samples : when Sardinians are included, the algorithm doesn't use the Basques as a reference population.

    NB : What do you think of the K runs of Behar et al 2010 ? They're rather similar to Polako's latest run.

    Pic

    ReplyDelete
  4. I don't think that defining a SW European cluster based on Sardinians makes any sense. After all, I'd consider the Italian region intermediate (and somewhat distinct, specially in isolated Neolithic pops. like Sardinians) between SW Europe, NW Europe, Central Europe and SE Europe.

    If the concept SW Europeans has to get any meaning that would be defined by Iberians, South French ('Occitans') and/or Basques-Gascons. This I guess is what he calls Atlanto-Med. Sardinians are anyhow anomalous (Neolithic origins with some strong founder effects and relative isolation) and, like Orcadians (mixed Scot-Norwegian), should not be used as reference, IMO.

    Much better to use populations that are consistently homogeneous and not rare isolates. Better Scottish Highlanders or Irish than Orcadians, better Gascons or Iberians than Sardinians. Neither Orcadians nor Sardinians are likely to have been the origin of any other population AFAIK.

    Also I'd say that FR2 looks Northern French, while FR1 looks more Southern, I doubt there's anything like an "average Frenchman", genetically speaking. Northern French seem to cluster best with NW Europeans on average, while Southern French look more like Pyrenean Iberians or the other way around. However this should be better researched to clearly assess it.

    "As a conclusion, Polako still hasn't found the proper reference samples : when Sardinians are included, the algorithm doesn't use the Basques as a reference population".

    That kind of bias keeps him blind. :P

    ReplyDelete
  5. "What do you think of the K runs of Behar et al 2010 ? They're rather similar to Polako's latest run".

    That they do not reach enough K-depth as to express the structure of Europe. It's ok for getting an idea about West Eurasia overall, and enough, it seems, to identify Palestinians as a distinct population but in Europe the structure remains at the equivalent of K=3, so it's much better Bauchet 2007, as reference. They should have done more runs, specially if they expected or wanted to identify a Jewish-specific or Jewish-plus-others cluster.

    I think that the three clusters apparent among Europeans (northern, Caucasian and Mediterranean) do not reflect the complexity we know exists in Western Europe (or Europe as a whole). The greatest diversity of West Asia also obscures the European personality pretty much, with only one or two European-specific clusters apparent at all.

    In K-runs it's common that you get, at shallow levels false appearance of admixture because the components only reflect minor affinities and not regional autonomous personalities. We know that Basques and Iberians easily show as distinct clusters but that's not apparent at all in Behar 2010 (and doesn't matter because they were studying West Asians primarily).

    ReplyDelete
  6. Polako has updated his blog :

    Intra-Southern European MDS maps (including Joe, Vincent and Dan from Genomes Unzipped)

    As I said to Polako, Plagnol is not a Catalan surname but a Languedocian one, more precisely a South-Central name distributed in modern-day Ardèche, Gard and Hérault the meaning of which is easy "little plateau" from local Occitan planh=plateau + diminutive suffix -òl. Since Catalan and mainstream Occitan are essentially the very same language, it's true that Planyol - as written by Catalan people - could be a Catalan surname but it's not attested : the map is clear. A French oïlic version of such family name would be Plagneau but it doesn't exist as such. In Limousin, one can find Plagnaud.

    To be precise, it seems that there are autochtonous people named Plagnol in Lot as well (Guienne, SW France). One would have to contact Vincent Plagnol to precisely know from where he originates.

    I'm still FR1 and I only appear in a map featuring French Basques : I'm rather isolated inbetween the first Spaniards on the left and the French Basques on the right. I suppose that this is a correct position for Gascon people that could be refined with more samples from the Franco-Cantabrian region. If "Vincent Plagnol" is purely Languedocian-descended, his very low Basque affinities are a bit of a surprise. On the second plot, he clusters with Spaniards though.

    Map 1

    According to the words of Polako : "The first plot basically shows how much "Basque" (right) and then "Italian" (down) influence you have, while the second looks at "Italian" (right) and probably "Central European" (down)."

    ReplyDelete
  7. Occitan, you're surely right. I really find it difficult to make a clear difference, but obviously the spelling with 'gn' instead of 'ny' should have given me the clue.

    Anyhow, you say, well... Polako said:

    "The first plot basically shows how much "Basque" (right) and then "Italian" (down) influence you have, while the second looks at "Italian" (right) and probably "Central European" (down)."

    The second axis actually shows distance between Italian and SW Europe (France AND Iberia). The axis is best defined for France but only slighty so. And we know (Bauchet'07) that Iberians have very low Central European component anyhow. In this axis Basques tend slightly to cluster best with Franco-Iberians (but only slightly so and varies a lot by individuals). You too.

    This would be much more interesting without Italians, because Italians are probably too much of comparatively outsiders and do not help resolving the French-Iberian axis, which probably does exist to some extent. If another group is considered convenient, I'd rather use Belgians or Brits - or even Germans. But in order to get a nice feeling of SW European structure, best would be to use only SW European samples, i.e. from modern France, Spain and Portugal - best if regional origin info is known.

    ReplyDelete
  8. Indeed regional info would be great. What I know :

    - The French sample comes from Lyons and is labelled "French (various regions) relatives".

    CEPHB

    - As I already told you, the French Basque sample is not properly located, my feeling is that it's Lower Navarre or inland Labourd (as Soule never attracts scientists). Let's add that on 23andme, a half-Labourdine Basque half-British woman with whom I'm sharing is somehow halfway between those French Basques and the Brits. Still, I could be wrong about Souletine people being more peripheral.

    On a side note, what did you think of this study's results dealing with GM haplotypes ? (Figure 3)

    Biomedcentral

    - FR1 : that's me. FR2 : still unindentified.
    - ES2 : half-Aragonese (where ? I've got his family surnames though, it could help), half-Valencian
    - PT1 : Minho
    - PT2 : Baixo Alentejo
    - FCA1 : French Canadian (with remote Irish imput)

    As for Vincent Plagnol, that's him : Link.

    ReplyDelete
  9. I don't know how important may be if the Basque sample is from Behenafarroa or from Zuberoa, really. The change should be rather small, though, of course, Souletins should be somewhat closer to Bearnois, from the viewpoint of geography at least.

    As for the paper, not sure what to think of Immunoglobulin genetics, which might be somewhat distorted by localized founder effects and such. Also I'd expect a paper focusing in Galicia to have provided various different samples of Galicians, as well as nearby peoples, such as North Portuguese, Asturians and Leonese. Actually the paper seems to provide more info on Basques and other Pyrenean peoples than Galicians themselves for that reason.

    However it does seem that at least this Galician sample clusters best with Valencians, as well as Portuguese, suggesting its belonging to the Iberian rather than Franco-Cantabrian (or at least Basque-Pyrenean) cluster.

    Three clusters or vectorial trends seem to appear in the European PCA: Eastern Med, Basque-Cantabrian-Pyrenean and NW European. The Iberian cluster does not appear well defined, nor does the Central European one, even if these are known to exist at wider analysis of haploid genetics. This clearly suggests a limitation of the power of analysis of single genetic markers but well, it's ok as long as you don't want to read too much on this alone. One should look at population genetics from various angles before judging.

    ReplyDelete
  10. This is what Vincent Pagnol states on Polako's blog :

    "Thanks for that, I really appreciate the thorough analysis. For what it's worth the name "Plagnol" certainly comes from the South, but I am not sure how far south. I was told once that the city of Joyeuse in Ardeche has quite a few Plagnols in the local cemetery but I never checked. Otherwise 25% comes from Brittany and the rest is a mixed bag from France which I cannot trace at this point (but I would like to). Overall it is highly consistent with your graphs, and I really like looking at these. "

    ReplyDelete

I've chosen to let people comment freely on my posts. Nevertheless, you'll lose your time taunting me and calling me a fascist (which I'm really not) : I pray you to read my introduction which will reassure that my intentions genuinely aim at achieving amateurish knowledge. I understand that you may not share my passion for the history of the peopling of the World, just don't let me know as clear conscience gained by bashing a humble documentary work is useless.