Wikipedia talk:Persondata

From Wikipedia, the free encyclopedia

Protected Template:Persondata has been protected indefinitely. Use {{editprotected}} on this page to request an edit.


[edit] German Wikipedia sister project

This project was inspired by the Personendaten Project on the German Wikipedia. For more information about that project, see this article. Kaldari 15:45, 24 December 2005 (UTC)

[edit] Example

Can someone link to an example?? CoolGuy 17:41, 24 December 2005 (UTC)

Some examples of articles using persondata include Ferdinand Magellan, George W. Bush, and Ross Winn. Keep in mind though that you won't actually see anything on the article pages unless you have metadata turned on in your user stylesheet. You can still see the persondata through the editing interface though. Kaldari 18:27, 24 December 2005 (UTC)
The What Links Here page for the Persondata article includes the pages that use it (as well as some places where it's being discussed). *Dan T.* 18:30, 24 December 2005 (UTC)

[edit] Visible despite empty User/monobook.css

My stylesheet is unmodified: User:Jeandré/monobook.css, but I can still see these entries in the displayed articles for the 3 examples listed above. Am I seeing things? — Jeandré, 2005-12-25t06:16z

Same thing here, I see the persondata table and I never modified my stylesheet. Fbergo 14:36, 25 December 2005 (UTC)
I have the stylesheet mod installed myself, and when I log out the persondata table vanishes as it's supposed to. What browser are you using? Maybe this situation is browser-specific. *Dan T.* 15:04, 25 December 2005 (UTC)
Logging out makes the persondata table disappear, but I believe it wasn't supposed to appear when logged in, unless we explicitly edit our stylesheets. I'm using Firefox 1.0.5 on Linux, but it doesn't seem to be browser-specific (the behavior is the same in Firefox 1.5). Fbergo 15:54, 25 December 2005 (UTC)
Gecko 20051010, Firefox 1.0.4, Ubuntu Linux package 1.0.7. — Jeandré, 2005-12-25t18:52z
MonoBook is now not showing it. I went thru 7 skins, and none of them show it despite the template not changing. For a while there my links were underlined, so maybe changes happening with MonoBook. — Jeandré, 2005-12-25t19:12z
MonoBook not changed. — Jeandré, 2005-12-25t19:20z

The entries being hidden depends on a CSS definition in Wikipedia:Common.css (which only admins can edit). If for some reason this CSS definition is not active, the entries will show up for everyone, whether logged in or not. I'm not sure what's been going on with the Wikipedia CSS lately, but I've noticed some weirdness lately too. Probably some kind of development work going on. Kaldari 20:54, 25 December 2005 (UTC)

Is anyone still seeing the persondata tables with unmodified user stylesheets? Kaldari 21:00, 25 December 2005 (UTC)

Not here, the issue is fixed for me (it required cleaning the browser cache, however). Fbergo 14:50, 26 December 2005 (UTC)

I don't even have an account, but they are showing up for me still. This is on Camino 1.0. 07:15, 21 August 2006 (UTC)

[edit] ALL CAPS?

Any reason for making the field names ALL CAPS, and for small-capsing the name? — Jeandré, 2005-12-25t06:16z

The field names are in all caps so that the information can be easily differentiated from information in the article body when constructing SQL queries to extract the data. The small-caps for the name is just for prettyness. Kaldari 20:57, 25 December 2005 (UTC)
I got rid of the smallcaps for the name since it didn't really add anything and looked out of place. Kaldari 00:07, 28 December 2005 (UTC)

[edit] Flourished?

Ray Saintonge brought this up on wikien-l:

The one thing that I wouild make clear about the contents is that one can add years when an author flourished. For some authors we anly know the years when they wrote, and know nothing of their lives before or after the time during which they wrote. Perhaps our German colleagues have already thought of that. [1]

This seems an excellent idea - especially for older historical figures, we know the period when they were active, but often their birth is not recorded... and if they then died obscure, and unrecorded, we've lost all biographical track of them. Something like, hmm...

Philip of Poitou:

 |NAME=Poitou, Philip of
 |ALTERNATIVE NAMES=Poitou, Phillip of; Poitou, Philippe of
 |FLOURISHED=1200 (or...
 |FLOURISHED=1191-1208 )
 |DATE OF DEATH=[[22 April]], [[1208]]
 |PLACE OF DEATH=[[Durham, England]]

Any idea on how best to implement this? It strikes me as useful - even if "flourished" is just a single year in the middle of the appropriate timeframe (here 1191-1208), it allows us to sort people relatively easily by period. Perhaps an optional line? It's good to get all these standardised early, I guess... Shimgray | talk | 21:21, 25 December 2005 (UTC)

It's maybe got too much subjectivity, though... do you need to cite references about when a person "flourished", in order to avoid WP:NOR? And would it apply to all sorts of biographical individuals, not just artists and writers? Did Adolf Hitler "flourish" when he had maximum power? Did Lee Harvey Oswald "flourish" when he assassinated Kennedy? *Dan T.* 21:53, 25 December 2005 (UTC)
Flourished is a very vague term, but basically means "the period we're actually interested in" - it's usually only needed for relatively obscure individuals. (See, say, Reginald of Durham for an example of "real use"). A different word may be appropriate - "active"? Shimgray | talk | 21:56, 25 December 2005 (UTC)
We would only be following standard artistic/encyclopedic practice to use "flourished" (usually just seen as "f.") - i.e. artists of one kind or another. I've never seen it used in non-artistic contexts. I can't imagine it being used for twentieth century or later figures such as Hitler or Oswald - the birth and death data is almost always available. Pcb21 Pete 17:12, 26 December 2005 (UTC)

Of course you can think about a lot of additional fields like "flourished" or "place of activity" but in my POV the time when someone lived is most essential. The parser I wrote to analyse the German Personendaten recognises dates like "12th century" or "1870s" or even "between 1560 and 1590". If you don't know the exact date of birth you should at least provide the year/decade/century/... -- Nichtich 00:56, 27 December 2005 (UTC)

Thanks! I didn't know the parser was that good... If we don't need to provide a precise year for birth/death, then we can just stick in "born 12th century, died 13th", and this takes care of it - no need to add a separate datafield. Shimgray | talk | 01:31, 27 December 2005 (UTC)
Well, then you can also put everything in one field ;-) parsing different date formats is difficult enough and birth/dead are different basic items so I'd better not mix it. -- Nichtich 10:36, 5 January 2006 (UTC)
"Flourished" usually means the period between the earliest and latest datable events during which the person was known to be alive. "Active" is sometimes used, as mentioned above. A "FLOURISHED" field would be an extremely useful addition to the template. I do a lot of work on medieval/Renaissance composers where "flourished" is all we have, and I don't want to make an original-research guess on a birth and death date. Antandrus (talk) 20:43, 15 April 2006 (UTC)

instead adding a category which is difficult to determine (as discussed above) why not incorporate into birth/death dates by using 'before' and/or 'after' modifiers? where we nothing of birth, 1st historical mention gives us the latest possible date & last mention is earliest date of death. this could be indicated with "before"/"after" or "<" or ">" in the existing date fields. would this work well with the parsing? Bsnowball 14:01, 13 October 2006 (UTC)

I think that's a good idea. As long as the parsing engine was fairly intelligent I don't think it would cause any problem. Kaldari 15:52, 13 October 2006 (UTC)

[edit] Use inside implementations of other templates

I've seen these template used inside the George W. Bush, where it duplicates a subset of information already given in the President Infobox template. Could the Persondata template not be implemented inside the president infobox - thereby implementing it on 43 pages at once. Ok, it means the scraping software has to be somewhat smarter, but then again that's ok because no scraping software is currently in production. Ditto in other templates that contain the relevant information. Pcb21 Pete 17:08, 26 December 2005 (UTC)

I suppose you only have President Infoboxes for presidents while all people have names, birthday and so on. Merging both templates is possible (I won't explain here how) but you'd need to modify both templates. -- Nichtich 00:42, 27 December 2005 (UTC)

p.s. the capital letters look ugly. Pcb21 Pete 17:08, 26 December 2005 (UTC)

The capital letters should only be visible in the source code. -- Nichtich 00:42, 27 December 2005 (UTC)

[edit] Good idea

  • I've added the Persondata template to about 40 biographical articles from my watchlist. --Dystopos 04:00, 28 December 2005 (UTC)

[edit] Awesome

I've been thinking about this ever since my Wikipedia:Semantic Wikipedia proposal. :D I'll start work on this! --Golbez 20:15, 28 December 2005 (UTC)

I guess the main thing we need to do now is work on publicity. Like maybe launching an implementation campaign and getting an article in the Signpost. Anyone want to help work on publicity? Kaldari 21:16, 28 December 2005 (UTC)
A pretty simple method to generate publicity would be to post a link to this place in the Edit Summary: +Persondata [[Wikipedia:Persondata]]. The autoinsert function works in your favor, you would only have to type +p . As the edit summary is constantly checked, this will draw curious eyes and perhaps even more collaborators. So this will attract people exponentially. It's more fun to integrate persondata than to check for spelling errors. Longbow4u 17:51, 30 December 2005 (UTC)
Ergo you should add this to articles about famous people first, which will be on many watchlists. -Splashtalk 23:04, 1 January 2006 (UTC)
Jojo, noted. Longbow4u 23:45, 1 January 2006 (UTC) Done. Longbow4u 23:53, 1 January 2006 (UTC)
Well it looks like our exponential growth didn't happen. The number of articles with persondata has stagnated at ~150 since the new year began (despite being added to the higher profile biographical articles). Anyone have ideas for how get things moving? Should we start a WikiProject? Any advice from the German Wikipedians? Kaldari 23:39, 9 January 2006 (UTC)
Is there an easy way to see a list of the articles with Persondata? Or better yet, an implementation of using it as a dataset? --Dystopos 00:01, 10 January 2006 (UTC)
Special:Whatlinkshere/Template:Persondata. -Splashtalk 00:34, 10 January 2006 (UTC)
OK. So you just count them by hand? Maybe if we showed a cool use for this, like automatically generated timelines showing lifespans of presidents, then people would jump on board. --Dystopos 03:00, 10 January 2006 (UTC)
If you're an admin, grap WP:AWB. It can count a whatlinkshere for you. -Splashtalk 03:50, 10 January 2006 (UTC)
As soon as gets a newer database dump, we'll be able to actually demostrate pulling persondata. Kaldari 05:40, 10 January 2006 (UTC)

[edit] Titled persons

How should we enter the name of people known principally by their title e.g. Louis II, Duke of Bavaria? Clearly, the alternative names are easy to deal with, but I'm not sure on 1)How such things are alphabetised anyway and 2)Whether we should use their 'actual' name as the name and their title as an alternative. Does have a view on this already? -Splashtalk 22:55, 1 January 2006 (UTC)

I believe the titles should be placed after the name separated with a comma, i.e.:
  • Louis II, Duke of Bavaria
  • Rothman, John, III
  • King, Martin Luther, Jr.
Kaldari 23:14, 1 January 2006 (UTC)
Ok, thanks. And what of Elizabeth Alexandra Mary Windsor and others such? -Splashtalk 23:32, 1 January 2006 (UTC)
Here's what they did in the German Wikipedia:
NAME=Windsor, Elizabeth Alexandra Mary
Kaldari 23:57, 1 January 2006 (UTC)
That seems eminently sensible, and was what I had in mind. Probably worth crafting guidance to that effect for the main page, without engaging in instruction creep. -Splashtalk 00:05, 2 January 2006 (UTC)

For what it's worth, the Library of Congress authority records have "Windsor, Charles, Prince of Wales, 1948- see Charles, Prince of Wales" and an actual heading at the second title. (Liz is under "Elizabeth II, Queen of Great Britain" and twelve thousand references)
AACR2 is the standard English-language "bible" for library cataloguers; I don't have access to a full copy right now (give me a few days), but the short edition has some rather useful recommendations - we could certainly do worse than adopt these, being as they are a major 'academic metadata' standard... Would there be any interest in me drawing up a short version of this "translated" into our context? It'd certainly help talking to external databases...
Usefully, for us, it discusses titles and suchlike -
  • Include any titles of royalty or nobility that usually appear as part of the name.
  • For a noble, we give their "proper name in [their] title of nobility" - so Lord Byron, George Gordon Byron, 6th Baron Byron would be [NAME=Byron, George Gordon Byron, Baron] (best to drop numbers); if he had been "Lord Byron of Greece", then he'd be filed under [NAME=Byron of Greece, George Gordon Byron, Baron]. If they're not identified by a title of nobility, but have no surname, we list them by most common form of name, then any identifier - so this would be [NAME=John Paul II, Pope], [NAME=Elizabeth II, Queen of Great Britain], [NAME=John, the Baptist], [NAME=Francis, of Assisi, Saint]
  • If they're not commonly referred to by a noble title - someone like John Buchan or Bertrand Russell, then just deal with the name as you would a normal individual.
There's a lot of other personal-naming rules - I can, as I've said, draw up a sort of style guide for this if there's interest... but, conversely, it might be best to go with the German standard, if there actually was one (I assume so). Ho hum. And I certainly don't want to engage in rulecreep... Shimgray | talk | 00:42, 2 January 2006 (UTC)
I think adding such guidance in the article space would be very useful. Perhaps a subsection for "Names" under "Data fields" after the examples. The German page has a small section devoted to instructions for each data field, so I think it would be fitting for us to do the same (perhaps taking guidance from the German instructions if anyone knows German). I think referring to the AACR2 might also be helpful, although I am not familiar with it myself. Kaldari 00:51, 2 January 2006 (UTC)
Right. I'm vanishing for a day or three - moving house - but I'll see if I can put something together for when I'm back online. Shimgray | talk | 01:08, 2 January 2006 (UTC)
I'd forgotten all about this... how does User:Shimgray/Metadata look? The problem is that we have all these people with noble titles on en:wiki... Shimgray | talk | 00:05, 15 January 2006 (UTC)
A few comments:
  • The paragraph about Lord Byron was a bit hard to follow. Not quite sure how to make it easier though.
  • Beethoven should defintely be "Beethoven, Ludwig van". Any time the middle name (or pre-surname) is lowercase it should go at the end rather than the beginning, as you would never alphabetize Beethoven as "van Beethoven". Charles de Gaulle, however, is tricky as "de" is sometimes capitalized in his name and sometimes not. I've seen his name alphabetized both ways. I think it's important to emphasize that the main purpose of the name arrangement is alphabetizing. That should make most cases common sense.
  • A better example for the double surname would be Townes Van Zandt as he is always referred to and alphabetized as "Van Zandt", never "Zandt". George MacDonald Fraser is a much more ambiguous case.
  • We might need to explicitly explain the "Brutus of Troy" example.
Kaldari 05:34, 15 January 2006 (UTC)

After doing more research on name sorting I realized that it's a lot more complicated than I thought. There are so many exceptions and ambiguous cases (especially regarding Arabic, Hebrew, and German names) that I think it might be best to avoid trying to catalogue specific rules here, and instead use a distributed approach: when in doubt, ask those familiar with the subject. To this end, I would like to propose the following instructions:

When specifying the person's name, use the following format: surname, forename middle names, title. For most cases this will be straightforward, for example, "George Walker Bush" becomes "Bush, George Walker". In some cases, however, there may be ambiguity about a person's surname. When in doubt, format the name according to how you would expect it to be alphabetized. For example, Ludwig van Beethoven would be alphabetized under "Beethoven", while Townes Van Zandt would be alphabetized under "Van Zandt". If you're not sure, ask someone familiar with the subject how they would alphabetize the name or consult a cataloguing guide such as the AACR2.
It is usually a good idea to list as much of a person's name as possible in the name field to avoid confusion with similar names. Do not include honorifics (such as "Dr.", "Professor", or "PhD"), however, unless they are part of a title of nobility.

Hopefully this will cover most cases and avoid instruction creep. Kaldari 19:25, 19 January 2006 (UTC)

[edit] Lighthearted uses?

What shall we do about things like User:CCMichalZ? Have a sense of humour, or actively discourage since it makes the intended usage rather harder to filter? -Splashtalk 00:28, 2 January 2006 (UTC)

Seems relatively trivial to only search in the main namespace, surely? And a lot of the applications for this will be done on a dump of only that namespace... I can't see it causing a practical problem. Shimgray | talk | 00:43, 2 January 2006 (UTC)
On the other hand, a firm decision on whether or not to put this next to fictional individuals would be handy. A lot of people will quietly bend the rules on that one, and it could well corrupt the data just a touch. Shimgray | talk | 00:45, 2 January 2006 (UTC)
In German Wikipedia we don't use persondata for fictional characters. You can argue in some cases (people from the ancient times, characters of the bible...) but we definitively don't take homer simpson and things like that. -- Nichtich 10:40, 5 January 2006 (UTC)

[edit] Biodata?

Could this be an idea? Creating something like this for organisms and such? Eksample:


And for plants:


And i think Sub- and Super- prefixes should be added too. They can always be left blank. Pyramide 13:03, 3 January 2006 (UTC)

Sounds like a good idea, although I wonder if the same function could already be served by the taxobox. It looks like the vast majority of species articles have taxobox templates included, and the formatting on them is pretty consistent. It seems like you could theoretically already harvest such information from the taxoboxes. Any second opinions? Kaldari 13:32, 3 January 2006 (UTC)
There are taxoboxes for every plant, aren't there? I introduced Persondata because there was no way to automatically extract the basic data of people out of the article text (and I hope you don't really want "personboxes"). You should first try if parsing taxoboxes is possible instead of parsing biodata templates. -- Nichtich 10:46, 5 January 2006 (UTC)

[edit] Not picked up by Google?

Louis Braille has a Template:Infobox Person. Google recently linked a search for Louis Braille birthday from their home page. The wikipdia data is listed above Google's news and first web hit.

Ferdinand Magellan has Persondata. Googling for Ferdinand Magellan birthday doesn't show the wikipedia article in the first 100 hits, but it does show the Wikipedia:Persondata page.

Wouldn't it be better to not have data duplicated (normalized), and displaying the single instance of the data? -- Jeandré, 2006-01-07t10:45z

You are correct that Persondata duplicates Infobox Person, in the same way that the proposed Biodata would duplicate the Taxonomy Infobox. The difference is that the Taxonomy Infobox is included in all species articles, while Infobox Person is not included in all biographical articles. There was a push a while back to implement Infobox Person more widely, but the push was met with resistance from those who felt the template was largely redundant with content already in the articles. At one point Infobox Person was even nominated for deletion I believe. There are also many articles for which Infobox Person would be obviously inappropriate, for example George W. Bush which uses Infobox President. There are numerous specialized Infoboxes for people that would make universal use of Infobox Person untenable as it would lead to redundant article content. Persondata gets around this problem because it is not article content, it is hidden metadata. Thus it could theoretically be adopted universally for all biographical articles. It's awesome that Google has figured out how to cull data from Infobox Person. It will be even better when they figure out how to cull data from Persondata (assuming Persondata is widely adopted as it was on the German Wikipedia). Ferdinand Magellan is a special case. The reason Google is linking to the Persondata page rather than Magellan's article is that Magellan is used as the main example on the Persondata page (and the Persondata page is heavily linked to, thus getting a high Google ranking). I hope that answers your questions. Kaldari 19:35, 7 January 2006 (UTC)

[edit] Hidden Metadata

I think the potential for indexing and crunching bare-bones biographical information via metadata is great, and I've added Persondata to 60 or so biographies on my watchlist. I start to wonder, though, whether having the template hidden invites errors (or worse, vandalism) to go uncorrected. Is there any way to use data tags in the article itself, so that changes to the article (both good and bad, but at least visible) would modify the data? I'm thinking of something along the lines of: '''{{person|John Doe}}''', also known as {{Altname|John Q. Public}} (born {{birthdate|[[April 27, 1940]]}} in {{birthplace|[[Chicago, Illinois]], [[United States of America]]|Chicago, Illinois}}) was a {{desc|pseudonymous palooka}}.

I don't know if that would be a lot more complex to aggregate, but it may have a few advantages as part of a wiki. Just thinking aloud. --Dystopos 23:59, 14 January 2006 (UTC)

I think this is a brilliant suggestion, as it also reduces duplication and can be used inside infoboxes. The only downside I see is that it might scare off newbies, tho infoboxes are surely scarier for them. (I've replaced the =s with |s above.) -- Jeandré, 2006-01-15t10:06z
See Template:Birthdate for a test implementation. Variable extraction doesn't work for the category yet, and for now it's only for RFC 3339 dates. -- Jeandré, 2006-01-15t10:26z
Here are my concerns:
  • Makes article text intimidating to edit.
  • Does not facilitate the primary purpose of persondata: alphabetical look-up of names. (This could be corrected, however, by a more complicated markup than what you have presented in the example.)
  • Would require seven or more template transclusions rather than one. Multiply that by tens of thousands of biographical articles and you create quite a deal of overhead for the servers.
Kaldari 22:14, 15 January 2006 (UTC)
  • Those concerns make sense. I would still rather see some way of making the metadata more apparent to someone reading/correcting the article. Here's another idea:
What if the Persondata were displayed in whatever infobox pertained to the article? By default a new generic biography infobox would be created, using just the PData. For articles that already use infoboxes (Presidents, etc), the fields that pertained to PData would be filled in by the PData, while other fields would still be filled in directly. --Dystopos 03:17, 18 January 2006 (UTC)
That would probably be the ideal way to do it, however it would require two difficult tasks:
  1. Inventorying and cooridinating an unknown number of infoboxes
  2. Convincing editors that all biographical articles should have infoboxes
Kaldari 04:04, 18 January 2006 (UTC)
I don't know anything about programming. Is there no way for one string in an infobox to search for a value from metadata in the same article and return it without regard to how the infobox is set up?
What's the smallest, most elegantly inobtrusive infobox that could be used with PDATA? --Dystopos 05:39, 18 January 2006 (UTC)
Template:Infobox Biography is probably the most unobtrusive implementation. Efforts to widely implement this infobox, however, have been met with resistance. Kaldari 19:36, 19 January 2006 (UTC)
I can see the naysayer's point. That's a pretty big goofy thing to force into every article. Let's think... The value of the metadata is in being able to correlate and process and index the information. So anything added to the article that makes the metadata more visible should also bring some of that value. We probably need to develop some of the coolest uses for the data, and then see if a small infobox on each article makes a decent portal to those results AS WELL as being a place where the content specific to that article is made visible.
It's not easy to think of an example right off. Perhaps this data eventually makes the things like Category:1920 births obsolete, so the infobox IS the link to a table of 1920 births, which is sortable by lifespan, birthplace, description, or name. --Dystopos 20:26, 19 January 2006 (UTC)
That would be cool. I wonder what the Germans have done with their persondata. Kaldari 23:47, 19 January 2006 (UTC)
As for the Germans. It is possible to change the user's monobook.css-sheet so that the Metadata in the article are displayed. So the users can verify if the metadata correspond with the data in the article at one glance. I did not hear about vandalism in Personendaten in Germany, though. The vandals want their changes to be seen. I don't think that they are outright sabotaging the project. Would be a massive sabotage effort, regarding several thousand entries. It's not worthwile. I did not implement that myself. Perhaps you should contact a developer as for the details regarding the change in the monobook. Or ask German user Nichtich. Greetings from Germany, Longbow4u 20:17, 23 January 2006 (UTC)
I always assumed a majority of vandalism is spotted through diffs. Any change to Persondata will still show up in diffs so hopefully "hidden" vandalism will not prove to be too much of a problem. And besides, Persondata is usually at the very end of a long article, stuffed amongst loads of categories and interwiki links which vandals usually shy away from. --Happynoodleboy 15:21, 30 April 2006 (UTC)

[edit] Seasons in data fields

A clever program would be able to deduce the hemisphere from the birthplace, if necessary. --Dystopos 03:17, 18 January 2006 (UTC)

[edit] name / fullname

should e.g. Michael Jackson have name Jackson, Michael & altname Jackson, Michael Joseph or what? Derex 18:06, 19 January 2006 (UTC)

I would recommend using "Jackson, Michael Joseph" as this will help distinguish him from other Michael Jacksons. Kaldari 19:31, 19 January 2006 (UTC)

[edit] Names and alternative names

Should the name field have the subject's birth name, their legal name at the time of death, or the legal name that they're best known under? I'm not quite sure how to handle maiden names, women who were married multiple times, or people who had their names legally changed at some point -- particularly when they're best known by a pseudonym, which I assume should always go in the alternative names field. (And I see I forgot to sign this, sorry. Aitch Eye 17:17, 25 January 2006 (UTC))

I've been assuming that NAME should be the alphabetized version of the name in the first few words of the article, and that all the others can go in ALTERNATIVE NAMES, with appropriate (parenthetical) descriptions of what/why they are. -Splashtalk 17:28, 25 January 2006 (UTC)

[edit] Collaboration

I've been adding Persondata to bios I come across. Perhaps its time for a group of like-minded Wikipedians to make this a group effort and give this a bit more publicity? Greentubing 08:25, 29 January 2006 (UTC)

I agree completely and would be interesting in helping in any organized effort to promote this project. For now I have just mentioned it on my userpage in hopes that people will notice it. Also, I try to mention Wikipedia:Persondata in the edit summary someplace hoping that will get it more exposure. --PS2pcGAMER (talk) 04:24, 6 February 2006 (UTC)

[edit] More efficient extraction query

Hi there. The current SQL query for extracting Persondata links relies on looking at the article text. This is very slow. I think I can rewrite it to use the templatelinks table to directly find pages that include {{Persondata}}. I'll update Wikipedia:Persondata if I come up with a suitable replacement. Mike Dillon 18:19, 29 January 2006 (UTC)

Here it is. I believe this directly finds all articles in the main namespace that include the {{Persondata}} template:

    SUBSTRING(SUBSTRING(pages.cur_text FROM INSTR(pages.cur_text,'{{Persondata')), 1,
        INSTR(SUBSTRING(pages.cur_text FROM INSTR(pages.cur_text,'{{Persondata')),'}}')+1)
        AS 'Persondata'
FROM cur AS pd
JOIN templatelinks AS tl
    ON pd.cur_namespace = tl.tl_namespace
    AND pd.cur_title = tl.tl_title
JOIN cur AS pages
    ON tl.tl_from = pages.cur_id
    AND pages.cur_namespace = 0
WHERE pd.cur_namespace = 10
AND pd.cur_title = 'Persondata'

I believe this does the same as the query currently on Wikipedia:Persondata, but it uses the database structures in place to do it much faster. Mike Dillon 18:32, 29 January 2006 (UTC)

As a data point, I did a comparison of counting Persondata inclusions with the old query that uses a text search against my version. The text search version has been running for over an hour and still hasn't returned a count. My version ran in about 1 minute (254 uses in main, one in namespace #4, whatever that is). Once the other one returns, I'll verify that the counts were the same, but I think the query above is a great improvement. Mike Dillon 19:18, 29 January 2006 (UTC)
Very cool. BTW, do you have any idea why the What Links Here page is not accurately reporting all the pages that include the Persondata template? I know there are about 300 articles that currently have the template, but the What Links Here page for the template only lists about 150 of them now. I know of several examples of articles that have the template that are not listed there, for example, Ross Winn. Kaldari 19:47, 29 January 2006 (UTC)
WLH is broken at present as far as counting template transclusions goes. It seems to be gradually correcting itself as articles containing tempaltes are edited after the introdction of this bug, but it's a very annoying bug that the devs are being slow to correct. See Bugzilla bugreport. -Splashtalk 20:02, 29 January 2006 (UTC)

So, the other query finally completed, after at least 4 hours (wikisign doesn't tell you how long). It counts 365, probably because of the aforementioned Mediawiki bug. Mike Dillon 03:48, 30 January 2006 (UTC)

I'd have expected the bug to cause underreporting if it is at all based on Whatlinkshere's way of working. -Splashtalk 03:49, 30 January 2006 (UTC)
You expected right, because that's what happened. Counting via cur_text was 365, counting via templatelinks was 254 (111 articles less). Mike Dillon 03:55, 30 January 2006 (UTC)

Because the template has always included a link to Wikipedia:Persondata, I was actually able to find the missing articles by diffing WHL for that page and the template itself. The list is at Wikipedia:Persondata/Missing links. These can be fixed using "null edits" (available in the popups tool). I can refresh the diff later, but if anyone fixes one, please remove it from the list to avoid duplication of effort. Mike Dillon 05:24, 30 January 2006 (UTC)

I'm going to see if someone with a bot can do the null edits off of my list. Mike Dillon 05:28, 30 January 2006 (UTC)
I asked User:Bluemoose to have User:Bluebot do it. Mike Dillon 05:31, 30 January 2006 (UTC)

By the way, the article count is 466 using Special:Whatlinkshere/Wikipedia:Persondata and weeding out links not from the main namespace. Mike Dillon 06:45, 30 January 2006 (UTC)

After Bluebot's help, Special:Whatlinkshere/Template:Persondata now shows 466 inclusions from the main namespace. I guess that means the templatelinks query will work fully after the next database dump. Mike Dillon 16:20, 30 January 2006 (UTC)

The WhatLinksHere bug seems to be fixed now. Kaldari 02:58, 6 February 2006 (UTC)

[edit] Standardized XML

Should this not be modified to use a format which is already standardized beyond merely the scope Wikipedia, like RDF or another XML metadata format? Michael Z. 2006-01-30 05:34 Z

MediaWiki does output XHTML (at least for my browser), so perhaps RDF could be embedded using namespaces. However, I'm not sure that MediaWiki would support that and I don't think browsers are very good with mixed namespace XHTML yet. If the {{Persondata}} template provided an XML ID, however, RDF could certainly be derived from the resulting XHTML using a simple XSLT stylesheet. Mike Dillon 07:09, 30 January 2006 (UTC)
I've added id="persondata" to the template. This will allow more precise CSS and XSLT rules to be written for the Persondata output. Mike Dillon 07:14, 30 January 2006 (UTC)
Perhaps class="persondata" is a better idea. Sooner or later someone will put two of these on a page like Pokrass brothers, and then the HTML will not validate or extraction scripts will break. Michael Z. 2006-02-06 06:20 Z
I know that people will do strange things, but part of the problem is that the instructions on Wikipedia:Persondata don't address the case of multi-person articles. The case you brought up doesn't fit the classic case because both of the brothers actually have articles of their own that should have the Persondata on them. However, there are definitely more difficult cases like Mary-Kate and Ashley Olsen (or Chang and Eng Bunker). These articles will not be split in the forseeable future, so should they have one Persondata block or two? I'd actually be in favor of saying that it should be one Persondata block, since the case of it actually being correct to leave multiple people in one bio is so rare (look at Category:Multiple people compared to the huge number of single-person articles). Leaving it as an id allows us to have snazzy Javascript that uses getElementById as was done by User:Sherool below. As far as I know, there is no decent equivalent except a full DOM traversal for finding elements by class. Mike Dillon 03:40, 7 February 2006 (UTC)
I guess if we relied on it being a table, we could find elements by tag name "TABLE" and then filter for the ID, but this still involves slightly gross code. Mike Dillon 03:42, 7 February 2006 (UTC)
Sounds reasonable, Mike, but we should take steps to prevent people from using the template twice on a page, like a big red capitalized notice.
Is there an easy way to find a page with two or more occurrences of a template? Michael Z. 2006-02-07 22:35 Z
I'm not sure where you want the big red notice, but a variant of the extraction query on the project page can be used to find multiple occurences. This can be run whenever there is a new database dump, but I don't know if there is any way to do it more frequently. We could probably make a CSS rule using adjacent sibling selectors to flag multiple uses of Persondata. It would look like:
TABLE[id="persondata"] + TABLE[id="persondata"] { border: red 2px solid }
Not sure if that would work across the board because "id" is special, but it works for me in a Mozilla-based browser in Linux. I tried playing around with the "before" pseudo-element and the CSS "content" attribute to generate a warning message, but it wasn't working. Also, this only works if the two tables are immediately adjacent. Mike Dillon 06:43, 8 February 2006 (UTC)

The following query should find multiple uses of {{Persondata}} in an article:

FROM cur AS pd
JOIN templatelinks AS tl
   ON pd.cur_namespace = tl.tl_namespace
   AND pd.cur_title = tl.tl_title
JOIN cur AS pages
   ON tl.tl_from = pages.cur_id
   AND pages.cur_namespace = 0
WHERE pd.cur_namespace = 10
AND pd.cur_title = 'Persondata'
AND LOCATE('{{Persondata', pages.cur_text,
    INSTR(pages.cur_text,'{{Persondata') + LENGTH('{{Persondata')) > 0

I haven't tested it and I'm not sure if MySQL is smart enough to use the templatelinks before it starts churning through all the cur_text. Mike Dillon 06:52, 8 February 2006 (UTC)

[edit] Accessibility

This is hidden from visual browsers by CSS, but not from text-only browsers or from many screen readers for the disabled. Hiding this using only CSS breaks a WAI priority 1 accessibility checkpoint: "Organize documents so they may be read without style sheets" [2].

One solution would be to hide the code inside HTML comments <!-- ... -->, which takes it out of the document scope altogether. Michael Z. 2006-01-30 05:41 Z

If the guidelines that are on the Wikipedia:Persondata page are followed (that is, putting Persondata at the very bottom of the article before interwiki links), the documents are organized so that they may be read without style sheets. It is no more of a problem than the presentation of categories. This isn't like the hiddenStructure problem, where delimiter text and other unintelligible stuff in the middle of the text isn't really hidden. The output of Persondata is intelligible, if redundant. To improve the situation, however, the metadata CSS class should have "speak: none" added, as was done in a half-hearted attempt to fix hiddenStructure. Mike Dillon 06:34, 30 January 2006 (UTC)
Done. Kaldari 06:46, 30 January 2006 (UTC)
That's a good step, but browsers and screen readers are not required to process the CSS at all (older versions of Jaws ignore display:none; do they ignore speak: too?). If the metadata is redundant and undesirable for sighted users of visual browsers, then they they should be taken out of the document stream for everyone, especially screen-reader users who may have a more difficult time reading than other users at all. Michael Z. 2006-01-30 07:12 Z
What about the rest of the stuff in the document stream like interwiki links and all the other non-article links on the page? Is the data in the Persondata table really more of a problem for screen readers than those? It seems like the only thing that will help screen readers that don't understand CSS is for MediaWiki to have a mode or preference so that only the document text can be shown for such screen readers. Mike Dillon 07:17, 30 January 2006 (UTC)
What do you not understand about equivalent accessibility? If sighted users need access to interwiki links, then text browser and screen reader users need them too. If we hide metadata for the convenience of visual browser users, then they should be removed from the document for the disabled, too. Apparently the metadata is more of a problem than interwiki links, because you take steps to hide the former and not the latter! Forget modes and preferences and CSS tricks; just include the stuff for everybody, and remove the other stuff for everybody. Do it the simple, correct way: make the metadata XML and put it in an HTML comment.
If you really want to show the metadata for screen readers, then change the CSS and show it for visual browser users, too. See how many seconds it takes for the wailing and reverting to commence. If we don't want to look at something, then it's not okay to make the disabled tolerate it, just because we're too lazy to build the page the right way in the first place. Usually, extraneous text and HTML is more of a burden on people who already have a hard time just reading web pages. Michael Z. 2006-01-30 10:08 Z
By the way, HTML comments in wikitext are stripped from the output, so that isn't an alternative. Mike Dillon 16:16, 30 January 2006 (UTC)
Mike Dillon, all text browsers and screen-readers handle links (interwiki or otherwise) just fine. The ability to link to other pages like that has been standard since the first browsers and screen-readers have been designed to handle them (otherwise the list of insertable character links at the bottom of the edit screen would be a nightmare). CSS is a more recent invention and is not supported by many of the standard applications for older platforms. Can someone explain whether the suggested alternatives would work or not? Is there any reason this information can't just be 'commented out' with <!-- --> or another method? Isn't that just as easy as using a 'meta-data' table type? --CBD 12:14, 30 January 2006 (UTC)
Commenting out is not appropriate as Persondata is meant to be seen and readable by those who are interested in using it and/or editing it, including the blind. It is meant to be hidden by default, however, since most people have no use for it. The ability to have it hidden is a nicety, but I would rather have it shown to all than shown to none (at least in this initial period of implementation). Really though, I see this as a false delimma, as having it hidden for everyone except older screen readers does not seem like a problem. There is nothing inherently problematic about have the contents of Persondata read. The information is valid and readable. As Mike said, the only potential issue is that the information is probably, but not necessarily, redundant with article content, but then again, so are most infoboxes. Unfortunately, the MediaWiki software does not give us a better way to implement this type of content. Kaldari 13:51, 30 January 2006 (UTC)
I think I understand the accessibility issues pretty well, and I feel I've articlulated my position well enough. You're implication that this is only a "problem" for disabled users is disingenous; it is also a problem for sighted users of Lynx and other text-mode browsers. Believe it or not, there are probably more signted users reading and editing Wikipedia with Lynx than there are visually impaired users with screen readers. I'm wondering if either of the critics have looked at the Persondata table either with the metadata CSS enabled or with Lynx, or whether they themselves need a screen reader and have found Persondata to be a problem. I applaud your envagelism for the visually impaired, but I believe your criticisms are overblown and misdirected. Mike Dillon 16:13, 30 January 2006 (UTC)
P.S. I actually think a good way to do output the stuff in {{Persondata}} would be using HTML meta tags, but having a way to do that in MediaWiki would probably expose Wikipedia to yet another linkspam channel. However, I don't think the problem is any worse than someone adding spam to External links; it will be caught as often as other vandalism is caught. Then, we could use "X-Persondata-Name", "X-Persondata-Alternative-Names", etc. and keep it out of the document stream for everyone. Personal Javascript and CSS could be used to visualize the Persondata for editors who need to see it. Mike Dillon 16:40, 30 January 2006 (UTC)
(edit conflict) Heh. As I haven't offered any criticisms I don't see how they can have been "overblown and misdirected". Nor how my statements about "text browsers and screen-readers" provided an "implication that this is only a 'problem' for disabled users". I'd assume you were talking to someone else, but you said 'two' people and Mzajac is only one. I use Firefox and thus have no problems with CSS, but I've talked with several users who do while dealing with the 'hiddenStructure' method of suppressing variable text. I agree that this 'metadata' implementation isn't as big a deal because the text is at the bottom of the article and would be displayed/spoken correctly (though perhaps redundantly). However, it would still be preferable to avoid methods which don't work for everyone. It sounds like commenting out doesn't work because we want people to be able to choose to display this info. Most of that could be done with templates, but I don't know of a way to identify the user currently logged in for the template. Javascript would have the same issues as CSS. The data itself could be set in HTML tags (which would then just be ignored by software which doesn't understand them), but I'm not sure we could then toggle display of those by user... should be a way to do it though. --CBD 16:52, 30 January 2006 (UTC)
I have looked at it in Lynx and in Safari with all CSS off. It's not bad-looking or huge, and Jaws users can skip a table. I'll settle myself down now, but I'm still concerned about the principle of using CSS to hide document structure. Remember that this project is a real ground-breaker, and it will probably set precedents for a lot of other metadata work.
I did not mean to imply that disabled users were the only ones that are affected, but in some cases they will be the ones who have to overcome the most difficulties in navigating web pages, and it is for them that these accessibility issues are most important. We can't foresee every possible situation, so it's important to stick to the word and spirit of good accessibility guidelines.
By the way, is there a reason the field names are in all uppercase text? They would be less obtrusive when visible if they followed Wikipedia's conventions for capitalization, but if they must be caseless, then all lower-case would look a bit less like a usenet flamewar.
If the data must be evident, but not necessarily constantly readable, I can think of a few ideas:
  • Build it into an infobox, to eliminate redundancy of both editing and reading. It could contain presentation fields (e.g. the primary name), along with hidden fields (e.g. all twenty alternate names).
  • Minimize the visible component, e.g just have a little tag along the lines of the wiktionary link, advertising the fact that you can edit the page to see and edit all of the comment-hidden metadata.
  • Use title attributes, a table summary attribute, or other accessibility device to present the data. Then it can be available on the page but unobtrusive. A table summary could be switched on and off using CSS or javascript, and is always accessible to screen readers.
  • Keep a link and summary on the page, but put the metadata in a sub-page at, eg. [[Article/metadata]].
And although raw comments are expunged by the wikitext processor, I'm sure a creative template-writer could think of a way to include them in the page. Michael Z. 2006-01-30 17:02 Z
I like the sub-page idea a lot. What we really need is a meta-data namespace! Kaldari 17:21, 30 January 2006 (UTC)
Just thought of a potential problem: I don't think subpages follow a moved page.
But a specialized namespace that worked like the talk: namespace would get around that. Different metadata in XML format could easily be mixed on one page, as long as the XML namespaces used for fields don't collide. It could be easily editable and readable as long as the XML formats were kept simple—heck, it could even be styled with CSS. Michael Z. 2006-01-30 18:22 Z
I'm not positive, but I believe that mixing XML namespaces would cause problems with even more current software than the CSS display suppression. To avoid messing up software that doesn't understand namespaces, we'd have to avoid PCDATA entirely and use only attribute values for content. I actually think the MediaWiki namespace idea sounds really good too.
Also, as for what I said earlier about Javascript, I meant that you would toggle on the display of metadata with Javascript, not suppress it. Assuming that MediaWiki had support for meta tags (which are part of the HTML head, not the body), it would be pretty easy to write Javascript that extracted data from the Meta nodes in the DOM and injected a table or whatever into the document stream based on user-defined Javascript (in monobook.js or equivalent). On further thought, though, this wouldn't work because the meta tag values can't contain links. Mike Dillon 02:44, 31 January 2006 (UTC)
If the data is in a table, identified by a good caption and summary attribute, then it should be easy for most screen reader users to just skip the table. Michael Z. 2006-02-07 22:37 Z
Do you happen to know if there is a way for Wikipedia to export these screen reader rules for common software? In other words, do you know if common screen readers have the ability to import/export these rules? If so, Wikipedia could provide these profiles to improve accessibility for those using adaptive technologies. What would be ideal would be if the screen readers could automatically find this profile (using either a fixed filename like robots.txt or an HTML "meta" or "link" tag). I'm going to start a separate thread about the caption and summary. Mike Dillon 06:17, 8 February 2006 (UTC)

[edit] Metadata format

I think embedded XML may be a better way to go than HTML meta tags. This would be easier to parse out with a standard XML parser, and would enable us to use standard namespaces, like the Dublin Core, or namespaces specific to particular discipline. RDF is one commonly-used method to do this. Michael Z. 2006-01-30 17:10 Z

[edit] Location in article

User:Reinyday has changed Wikipedia:Persondata to indicate that the persondata template should precede the category links. I understand there's still some discussion about locating Metadata for accessibility by browsers and bots. It strikes me that coordination with Category:Articles to check for link ordering could go both ways. We could stay out of the way of the cleanup bots, or they could be reprogrammed to not interfere with this project. Which is to be preferred? --Dystopos 20:40, 30 January 2006 (UTC)

Is there any downside to agreeing to User:Reinyday's change? (other than potentially having to move the 350 existing template inclusions) Kaldari 00:19, 31 January 2006 (UTC)
I think that the Wikipedia:AutoWikiBrowser follows the old styleguide on Wikipedia:Persondata, placing Persondata between categories and interwiki links.
By the way, the count is now 468. You can now actually trust Special:Whatlinkshere/Template:Persondata after the touches were done by Bluebot. Mike Dillon 03:41, 31 January 2006 (UTC)
Since no one has objected, I guess we'll leave the change. If you've added persondata to articles, you may want to move it to immediately before the categories now. Kaldari 02:16, 5 February 2006 (UTC)

[edit] Alternative to messing with monobook.css every time

Add the following code to your monobook.js instead:

function addlilink(tabs, url, name, id, title, key){
    var na = document.createElement('a');
    na.href = url;
    var li = document.createElement('li');
    if(id) = id;
        if(key && title)
            ta[id] = [key, title];
        else if(key)
            ta[id] = [key, ''];
        else if(title)
            ta[id] = ['', title];
    // re-render the title and accesskeys from existing code in wikibits.js
    return li;

function addTab(url, name, id, title, key){
    var tabs = document.getElementById('p-cactions').getElementsByTagName('ul')[0];
    return addlilink(tabs, url, name, id, title, key);

function doToggleMeta() {
  var element = document.getElementById('persondata');

  if ( == 'none') = 'block';
  else = 'none';

function addToggleMeta() {

  if (document.getElementById('persondata') != null)
    addTab("javascript:doToggleMeta()", "Show/hide persondata", "ca-togglemeta", "Toggle persondata on/off", "");

if (document.title.indexOf("Editing ") != 0) {
  if (window.addEventListener) window.addEventListener("load", addToggleMeta, false);
    else if (window.attachEvent) window.attachEvent("onload", addToggleMeta);

It will add a "Show/hide persondata" tab in articles that contain the persondata template (well technicaly anyting that include id="persondata" on anyting) clicking on it will toggle the visibility of the persondata table on or off. Nothing happens if the page does not contain an element with the persondata id. This is just a "quick and dirty" script I cobbled together. So feel free to refine and expand on it, but it works (In Opera 8.5 at least), and beats having to edit the monobook.css file every time you want to look at these tables ;) --Sherool (talk) 01:52, 6 February 2006 (UTC)

That's pretty cool. I actually had addlilink and addTab already, so I just needed doToggleMeta, addToggleMeta, and the load/onload attachments. I actually wanted it to be a toolbox link, so I used addToolboxLink instead of addTab. By the way, this works whether you default to hiding or showing Persondata.
Here's the Javascript for addToolboxLink:
function addToolboxLink(url, name, id) {
    var tb = document.getElementById('p-tb').getElementsByTagName('ul')[0];
    addlilink(tb, url, name, id);
After that, the call to addTab in addToggleMeta needs to be changed to addToolboxLink. I tested this in Galeon, a Mozilla-based browser. Mike Dillon 02:29, 6 February 2006 (UTC)

[edit] Alternate name: other scripts, transliterations, etc

Should the alternate name field include all of the following? Are all Unicode characters okay?

  • Alternate English spellings
  • Alternate transliterations of a foreign name
  • Name in other scripts (Cyrillic, Arabic, Chinese)

This could really grow. For example, this page lists 58 different Latin versions of Nikita Khrushchev's last name, transliterated from various languages (but if you are searching Chinese books for Khrushchev, you need to search for Hei-lu-hsue-fu, He Lu Xiao Fu, and Ho-lu-hsiao-fu). And of course, there would additionally be a number of non-transliterated versions in other writing systems.

Is there a way to label all the many versions as to their source language and transliteration system; that is, can we just put this info in the field in parentheses or something? For example:

Хрущев, Никита Сергеевич (Russian Cyrillic);
Хрущёв, Никита Сергеевич (Russian Cyrillic, dictionary-correct);
Khrushchëv, Nikita Sergeyevich (Russian, BGN/PCGN transliteration);
Khrushchyov, Nikita Sergeyevich (Russian, simplified BGN/PCGN transliteration: Wikipedia);
Khrushchev, Nikita Sergeyevich (conventional English, based on simplified BGN/PCGN transliteration);
Khrushchëv, Nikita Sergeevich (Russian, ALA-LC transliteration);
Hruščëv, Nikita Sergeevič (Russian, GOST transliteration);
Xruščëv, Nikita Sergeevič (Russian, scientific transliteration);
Hruŝëv, Nikita Sergeevič (Russian, ISO 9:1995 transliteration);
Хрущов, Никіта Серґеєвич (Ukrainian Cyrillic);
Khrushchov, Nykita Sergeyevych (Ukrainian, BGN/PCGN transliteration);
Chruschtschow, Nikita Sergejewitsch (German phonetic transcription);
ニキータ・セルゲーエヴィチ・フルシチョフ (Japanese)
etc, etc.

Or should these have HTML/XML metadata (meta-metadata?), like

<span lang="ru" title="Russian Cyrillic">Хрущев, Никита Сергеевич</span>;


<span lang="ru">Хрущёв, Никита Сергеевич</span> (Russian Cyrillic, dictionary-correct);

Someone would find this useful, but for our purposes would this constitute good metadata, or a wasteful bit-dump? Michael Z. 2006-02-07 23:01 Z

Personally I think the alternate names list should be limited to commonly used versions of their name or names they used themself. For example, since Magellan lived in both Portugal and Spain, we would list the Spanish and Portuguese transliterations of his name, but not, for example, the Dutch transliteration. For each language, I think only the most common transliteration is necessary. Kaldari 23:42, 7 February 2006 (UTC)
There is definitely no such thing as "the most common transliteration". In the example above, there are two "conventional" systems, both often used in the text of publications or in the popular media. ALA-LC is used in North American libraries and academic papers. BGN/PCGN is used in the UK, and appears in a lot of books, too. GOST is the official Russian standard. None of these can be written off as unimportant. People will edit-war over them.
And if this is meant to be useful to serious scholars, shouldn't it contain foreign transcriptions, to help them search more than just English-language publications?
Besides, Wikipedia's conventional transliteration is already in the article lead, what's the point of a separate metadata block repeating that? It would be better to just add an ID attribute to the text in the article and avoid redundancy. Michael Z. 2006-02-07 23:51 Z
I don't think it's necessary to have every language listed in the english article. Ideally, other languages will also have persondata sections for each article, so the respective versions can be gleaned from those other languages. It may be useful however, to have the subjects native language listed as they would probably have been commonly referred to under that name.
Regarding redundancy with the article text, yes, the information is usually completely redundant, but that's OK. And really, if you feel like adding every possible language and transliteration to the alternate names field, I don't think anyone's going to stop you. I just don't feel like it's really necessary. Kaldari 00:06, 8 February 2006 (UTC)
I don't relish the idea of entering that, but someone might. I just think it's important to anticipate this, so it doesn't unexpectedly reduce the usability of the persondata scheme.
If it does happen, this kind of meta-metadata would make the results much more useful, rather than having a string of 20 or 30 names and having no way of knowing where they come from, what they represent, or whether they are real or bogus. Michael Z. 2006-02-08 00:37 Z

[edit] Caption and summary

Michael Z. has suggested that we include a caption and summary on the Persondata table. The caption is easily supported with the "|+" wikitext and the summary attribute can be added directly where the class and id are now. The question is, what should they contain? I'm not sure what the distinction between caption and summary is. In any case, any of the Persondata fields could be included, so it could be something like "Biographical data for Bush, George Walker" or some such. Suggestions? Mike Dillon 06:21, 8 February 2006 (UTC)

I guess I'll answer myself from the HTML spec [3]:

When present, the CAPTION element's text should describe the nature of the table. The CAPTION element is only permitted immediately after the TABLE start tag. A TABLE element may only contain one CAPTION element.
Visual user agents allow sighted people to quickly grasp the structure of the table from the headings as well as the caption. A consequence of this is that captions will often be inadequate as a summary of the purpose and structure of the table from the perspective of people relying on non-visual user agents.
Authors should therefore take care to provide additional information summarizing the purpose and structure of the table using the summary attribute of the TABLE element. This is especially important for tables without captions. Examples below illustrate the use of the summary attribute.
Visual user agents should avoid clipping any part of the table including the caption, unless a means is provided to access all parts, e.g., by horizontal or vertical scrolling. We recommend that the caption text be wrapped to the same width as the table. (See also the section on recommended layout algorithms.)

So the caption is visible when Persondata is visible, while the summary is not. Mike Dillon 06:24, 8 February 2006 (UTC)

Right. The caption is essentially a title for the table, and should concisely describe what the table is; it should replace the current two-column header row. The summary serves screen-reader users in lieu of being able to glance at the table and see what it contains, perhaps summarizing a table's conclusions or trends, in addition to describing its subject. Most screen readers can read out the list of table headings in response to a keystroke, so I don't think it's necessary to repeat them all here, although for this purpose the field names should be put in headers (!) instead of just plain table cells (|).
For persondata, something like the following may be good, if it's technically possible. Michael Z. 2006-02-08 18:31 Z
Wikipedia persondata for John Smith
Metadata about John Smith, his other names, birth, and death

Here is a version with the proposed changes: User:Mike Dillon/Persondata. A comparison against the current table is at: User:Mike Dillon/Sandbox. Feel free to tweak the template. Mike Dillon 02:46, 9 February 2006 (UTC)

Some of the styling currently done as inline CSS could be moved into the common definitions if the style is accepted. Mike Dillon 02:47, 9 February 2006 (UTC)

[edit] Metadata WikiProject

Hi all. After some of the discussions here, I've been thinking about proposing a Metadata WikiProject. The goals would be to expand the availability of varius kinds of metadata on Wikipedia, coordinate metadata strategies for domain-specific intiatives (e.g. Persondata for biographies), and drive changes to the MediaWiki software to support the capture and use of metadata. Would anyone active here be interested in such a project? For a while, I was also thinking that a WikiProject Accessibility would be needed, but this seems to be part of WikiProject Usability. Mike Dillon 04:55, 12 February 2006 (UTC)

I was thinking about proposing a project to help expand the usage of Persondata templates. If it is part of a bigger project instead, I would be fine with it. In any event, I am definitely willing to help out. --PS2pcGAMER (talk) 00:18, 25 February 2006 (UTC)

[edit] Update count

I find it rather discouraging to see so few Persondata articles, and such a stale count on the main page. Any change someone could update the number of articles with persondata in'em? --maru (talk) contribs 19:27, 11 April 2006 (UTC)

User:Mike Dillon updated it for you. Just so people know, if they have access to WP:AWB, the approximate total can be calculated with it. --PS2pcGAMER (talk) 21:09, 15 April 2006 (UTC)

[edit] Any conclusions and recommendations for other languages?

The Persondata has now been on en.WP for four months (and de.WP for almost two years) and has been included in 950 articles. Is it a success or a failure? Would you recommend that other languages (e.g. Swedish) implement the same, or something different, or nothing at all? What are the success factors and what are the obstacles? Can automatic text mining of the existing articles be a successful alternative? --LA2 11:39, 28 April 2006 (UTC)

Hard to say. On English, it's a bit of a failure, as it is not in many articles and there doesn't seem to be any use of it. But I don't know how it is used on de. And from what I hear of the Persendata drive prior to the publication of the de Wikipedia, text mining needs to be manually hand-checked by a human. --maru (talk) contribs 13:37, 28 April 2006 (UTC)
I don't think we should give up so soon, but I don't think implementing a persondata on an article-by-article basis, often duplicating other infoboxes is a long-term flyer. See below. Pcb21 Pete 14:49, 28 April 2006 (UTC)
My conclusion is that the lack of applications that use the data is a more severe failure than the small number of people that have been tagged. If more useful applications (e.g. a people search index at existed, I think more biographies would get tagged pretty soon. I'm waiting for more applications to turn up before I try to introduce persondata in the Swedish Wikipedia. --LA2 06:02, 1 May 2006 (UTC)
Yeah. but is that a chicken-and-egg problem? --maru (talk) contribs 12:33, 1 May 2006 (UTC)
No Problem, here it is (for the German Wikpedia): 23:00, 15 March 2007 (UTC)

[edit] Revisiting Infobox Person

I think we should be bold and push on with integrating this infobox as an implementation detail of Infobox Person, and that template should be an implementation detail for all the other infoboxes (the canonical example above being the Presidential infobox). If we design/modify the templates in the right way, we really will start getting a lot of uses of the data and it will accepted by the community at large because only very minor appearance differences (at most) will be seen. What do you think? Pcb21 Pete 14:47, 28 April 2006 (UTC)

I still prefer keeping Persondata separate, for several reasons. Primarily, I think there will be resistance to including Persondata within other templates. There are several editors who are rather strongly opposed to having hidden metadata at all, but they seem satisfied if Persondata is limited to appearing after the article itself. I think all that is really necessary to achieve more widespread adoption is a project to promote adding Persondata and hopefully a tool to automate it at some point. Kaldari 22:13, 28 April 2006 (UTC)
I think it's a good idea. If need be, we can have someone with AWB or a bot go around extracting out the persondata to make those people happy. --maru (talk) contribs 02:00, 29 April 2006 (UTC)
Not trying because there might be resistance is not a good reason I don't think. Who are the several editors who are opposed to hidden metadata. What grounds are they objecting on? We could go down the route of adding persondata separately but then you will definitely get users complaining about the duplication, and I think that is a legitimate complaint. Pcb21 Pete 17:36, 30 April 2006 (UTC)
So are you proposing using persondata only within Infoboxes and never separately? Kaldari 20:44, 30 April 2006 (UTC)
If infobox person implemented persondata internally, would an editor ever want to add a persondata rather than an infobox? If not, then I guess I am proposing that. Personal CSS can be used to turn display on or off. Pcb21 Pete 14:37, 1 May 2006 (UTC)

[edit] Location "as was" or "as is now"?

Should the locations of birth and death be given as the current designation (name, administrative division, country) or the designation at time of death? For instance, did Charles Darwin die in "Downe, Bromley, Greater London", or did he die in "Downe, Kent" (as it then was?); did Richard Francis Burton die in "Trieste, Austria-Hungary" (as many biographies describe) or in "Trieste, Italy"? TheGrappler 18:58, 20 May 2006 (UTC)

I would lean toward "as it was". I looked to see if this is addressed in Wikipedia:Manual of Style (biographies), but it's not there. I didn't see any mention of a standard for this sort of thing within articles at either Wikipedia talk:Manual of Style (biographies) or Wikipedia talk:WikiProject Biography, but those might be good places to ask. Since the MOSBIO page does say that biographies of dead people should be written in the past tense, it seems to imply that in general, events should not be described anachronistically. If a modern frame of reference is required, the article text should say something like: Downe, Kent (now part of the Bromley borough of Greater London). Mike Dillon 19:24, 20 May 2006 (UTC)

[edit] "Flourished" and burial location

On a completely different note: standardizing which fields to include is obviously extremely important. I really think that people have overlooked the importance of "flourished" - this is critical to many figures from Classical civilization, and outside of nobility, the Dark Ages and Middle Ages too. In many instances there is nothing inherently subjective about "fl.", and it is utterly encyclopedic - it is used by many well-respected biographical and encyclopedic sources. For many people, it is simply "their first entry in the historical record" to "their last entry in the historical record" when they faded into anonymity. For artists it may be from their first major work to their last. There is no need to make up "likely" years of birth or death. Another field that I think has a shot at being useful and encyclopedic is "Burial location" (or location of final resting place; there may be a better way to phrase this!). Many biographical summaries contain this piece of information, I think it would be relevant to Persondata too. So long as we have relatively few entries, Wikipedia should be open to adding to the Persondata fields; I am sure that previous entries could be expanded by means of a bot if needed. TheGrappler 19:34, 20 May 2006 (UTC)

For more information, the very brief floruit article is very useful. TheGrappler 19:38, 20 May 2006 (UTC)
I agree that flourished can be a very useful distinction to make. For myself, I would much prefer to have clear and reliable flourished dates than subjectively made up birth and death dates, which can vary widely between sources, and tend to rely on flimsy guesswork. However, I didn't know what fl. meant until my senior year in college, and neither did my T.A., so perhaps some education will have to happen with it. Mak (talk) 20:29, 20 May 2006 (UTC)
I strongly agree. We need this field. For example, for not many, but most composers (and probably artists) of the late middle ages and early Renaissance, we only know the periods that they were active, between their first dateable entry in the historical record and the last. Sometimes they were fifteen years old when they began "flourishing" and sometimes forty; we can't know. When we get an actual birth date we're lucky. "Flourishing", as TheGrappler notes, is peculiar too artists, writers, composers, and such, and there is nothing subjective about it, since those dates are reported by the New Grove, Britannica, and other of our principal sources. We don't want to be in a position where we have to make an original-research guess as to when someone was born (was Hugo de Lantins born in 1380 or 1405? How about Perotin, where all we know is that he probably did most of his work around 1200). Antandrus (talk) 20:35, 20 May 2006 (UTC)
It's interesting to see there is support for this proposal - it's obviously not as "dead" as the old discussion's petering off suggests. Should there be two fields ("began to flourish" and "ceased to flourish"?) or will one suffice? Two seems excessive for examples such as Perotin, and it would be easy in all other cases to compress the data into one field. While that sounds natural, I just wondered if someone more technical knows whether that makes the data harder to read and use? Also, did anybody have any thoughts about adding final resting place as a field? TheGrappler 21:32, 20 May 2006 (UTC)

>"For myself, I would much prefer to have clear and reliable flourished dates than subjectively made up birth and death dates, which can vary widely between sources, and tend to rely on flimsy guesswork."
Surely you can't seriously be suggesting the birth and death dates are more subjective than "flourished" dates. That's rediculous. Perhaps you are being sarcastic and I'm missing the joke. If anything, it seems to be that flourished dates would be completely subjective and often based on original research. Kaldari 03:20, 21 May 2006 (UTC)

The "flourished" dates are the beginning and end of the person's trace in the historical record. Grabbing an example more or less at random, Antonio_da_Cividale first appears when he enters a monastery in 1392 (his age is not known, or even guessed); and he disappears in 1421, the date of the last precisely dateable composition he wrote. Any guesses of birth and death for him would be original research, but the begin and end of "flourishing" can be sourced reliably. Antandrus (talk) 03:27, 21 May 2006 (UTC)
Indeed, it isn't limited to artists, musicians and writers. Tanaka Shosuke first appears in the historical record when he became the first recorded Japanese in the Americas in 1610. In 1613 he crossed the Pacific again, as part of a project to send the first Japanese embassy to Europe. But whether in 1614 he changed ships to travel on to Europe, stayed in Mexico to await the return of the Europe-bound contingent, or headed back to Japan in the return journey of the original ship, is simply lost to history. It's as if he "arrives" in 1610 and "vanishes" in 1614, dates we know exactly and which will only be extended if new evidence is discovered. What is the point of making up fictitious approximate birth and death dates when we have these exact dates to hand? It's "research" to say "fl. 1610-1614" only in the sense that this is the period in the historical record which another publication (our source) declares him to be "known to history"; it's not a case of Wikipedians digging around the primary evidence themselves so it's not "original" research. Of course, that publication could have overlooked a subsequent or earlier piece of documentation about the subject, so in a sense the fl. dates are based on historians' knowledge and understanding of the historical record rather than an "absolute" like a birth or death date, and may potentially be subject to change. But compare this to birth and death dates: evidence about when somebody was born or died is also often scant and disputed. Felice Beato was originally thought by historians to have been born in 1825, but now it appears he had been mixed up with his brother and the date was likely 1833 or 1834. A "Felice Beato" was born in Corfu in 1834 but nobody knows for sure if that's the right guy. So not only has his birth-date "changed" when historians realised their error, no new date has been confirmed. (Even more oddly, nobody knows quite when he died - he simply vanished! A date of death has been suggested based on the liquidation of his business, but this is not clearly any more "objective" than the use floruit for a medieval figure.) TheGrappler 09:08, 21 May 2006 (UTC)
I'm sure there are other examples similar to the ones you show above, but for the vast majority of people with Wikipedia articles, the birth and death dates are well established and the flourished dates would be completely subjective. Flourished dates only seem to be commonly used for people who are known from a handful of historical artifacts and for which birth and death records are not available. For these people, it is common to cite flourished dates in place of the birth and death dates. We could do that within the existing framework. For example, for Tanaka Shosuke, you could put his birthdate as "before 1610" and his deathdate as "after 1614", or you could put "circa 1600" or something similar. I just don't imagine that many people doing searches based on flourished dates (which is effectively the purpose of the metadata). Kaldari 19:15, 8 June 2006 (UTC)
On the contrary, searching for "flourished" dates actually makes a lot of sense: that's precisely the period when that person is interesting! And I don't think that setting birthdate as "before 1610" is a particularly good way of doing this. Think about it: if I wanted to search "who's about in 1100?" would I want Shosuke turning up? Of course not, but by saying he was born "before 1610" he's not excluded from being born before 1100. If we did it only by flourished dates he would be excluded. Now if I wanted to search "who's about in 1612?", Shosuke would turn up, along with anybody born in 1612 or earlier and dying in 1612 or later. If one was using an especially clever search, it might even give a certain period of leeway from flourished dates that is not given to birth and death dates (or at least, to the earliest possible birth dates and latest possible death dates). Who'd use it? Well, serious academics (historians, biographers, encyclopedists) are well used to using "flourished" or "floruit" rather than birth and death dates. Does it make sense to search for? Clearly. Is it subjective for people for whom we have birth and death dates? Well, we wouldn't be using it if we had birth and death dates! But if we want serious credibility we should get in line with academics and use "fl." when they use "fl." TheGrappler 06:28, 24 June 2006 (UTC)
It seems that there is consensus for this. How do we implement it? I've never worked with "metadata" type things, so I won't venture to edit the article page (plus, I have a funny feeling that wouldn't actually make it work). Mak (talk) 18:59, 8 June 2006 (UTC)
One thing that needs to be changed is the documentation that says all fields are compulsory. Two things that need to be decided are whether we call it "flourished" or "floruit" (the latter may be more common but is also somehow more obscure!) and whether it needs two fields (for start and end) or just one. TheGrappler 06:28, 24 June 2006 (UTC)
Although I still would prefer working within the existing framework rather than adding a new field, I think "flourished" is much more accessible than "floruit". Wikipedia is not targeted to academics after all. I wonder if we could find out how the German Wikipedia handles flourished dates (if at all). Kaldari 17:30, 24 June 2006 (UTC)
I'd agree with using "flourished" rather than "floruit" - the main advantage of the latter is that people might take a look at the rather good floruit article. The German Wikipedia seems to deal with it by leaving birth and death fields blank if there are specific floruit dates (see de:Osmund). One clear advantage I can see to using a floruit field is that it means that a person using the data can make their own decision about how much leeway to give either side of the floruit dates, depending on what they are searching for. TheGrappler 17:36, 25 June 2006 (UTC)

[edit] List of persons without Persondata?

I think it would be quite useful if we could have an editable list of people that does not have "Persondata" associated with it. I think this would definitely speed up the implementation and usage as editors would know which persons still needed the template applied to them. This list could be generated by going through the Category:Living people and Category:Dead people and comparing against what links to this template. I don't know enough SQL to pull such a list from the database dumps (or even if it is possible), but I would definitely support and work on such an effort. --Reflex Reaction (talk)• 18:38, 8 June 2006 (UTC)

Still hoping for a response. --Reflex Reaction (talk)• 15:26, 19 June 2006 (UTC)
You can request queries here.--NMajdantalk 15:07, 26 June 2006 (UTC)

[edit] FA

I suggested that having persondata on biographical articles be a requirement for Featured Articles. We'll see if it passes.--NMajdantalk 14:55, 26 June 2006 (UTC)

An effective way I've found of publicising Persondata is at least to add it to some of the articles on the main page. This may be a good idea - could you give us a link to the debate? TheGrappler 17:26, 27 June 2006 (UTC)
That is a good idea. Here is the link to the discussion: --NMajdantalk 17:43, 27 June 2006 (UTC)

[edit] Name

Should we put the current name of the person or their birth name? Obvioulsy the other one would go under alternate names... American Patriot 1776 04:24, 2 July 2006 (UTC)

I suppose the main name to use is the article title name. For instance, use Cliff Richard rather than Harry Webb. Noisy | Talk 08:41, 2 July 2006 (UTC)
Agreed. Usually, the current name of a famous person is more widely known to the public than their birth name. However, a redirect page of their birth name to the current name is also a necessary inclusion. --Siva1979Talk to me 18:42, 2 July 2006 (UTC)
An interesting point is whether the article name should match the name field exactly. Having discussed this with User:Circeus, I am of the opinion that the name should be written in full (with initials spelled out etc.) while Circeus believes the name field should match the article title, and details like middle name and initials spelled out should go under the "alternative names". The relevant text in the guidance notes is When specifying the person's name, use the following format: [surname], [forename] [middle names], [title]. For most cases this will be straightforward, for example, "George Walker Bush" becomes "Bush, George Walker". In some cases, however, there may be ambiguity about a person's surname. When in doubt, format the name according to how you would expect it to be alphabetized. This, however, is sufficiently ambiguous that Circeus and I are doing different things. We could do with some third opinions, and whatever develops as the consensus, we should clear up the guidance text. TheGrappler 21:48, 8 July 2006 (UTC)
It also says: It is usually a good idea to list as much of a person's name as possible in the name field to avoid confusion with similar names. Thus I believe you have the right approach rather than Circeus. Kaldari 12:40, 9 July 2006 (UTC)
Seconded. Noisy | Talk 14:58, 9 July 2006 (UTC)

[edit] Link to Library of Congress Name Authority? included references to the name authority file of the German national library. If a persondata template is being imlemented for wikipedia.en, it seems a good opportunity to provide space for a reference to the Library of Congress name authority file (to which OCLC provides an API; OCLC also maintains a uri & url with the MARC data for each name). Building links between Wikipedia and the Library of Congress name authority file would help machine readers of Wikipedia. It might also help speed up the process of filling in persondata for Wikipedia.

Six months ago there were approx 28,000 people articles (and over 4,000 people stubs) listed on Wikipedia. From this list I tried automated matching against the LoC name authority file (using the stringent conditions that names and birth/death dates needed to match): around a third of people on Wikipedia (approx 10,000) could be found there. I didn’t know where to post the matches I generated between Wikipedia and the name authority file – is anyone interested in them? Dsp13 13:54, 8 July 2006 (UTC)

Stick it in your user space. Someone will want to look at it sooner or later. --maru (talk) contribs 02:56, 8 July 2006 (UTC)
Thank you. I will try to do that. Dsp13 01:15, 10 July 2006 (UTC)
I have now put up a table - of people with surnames beginning with Z - as an example. Comments welcome! Dsp13 02:12, 15 July 2006 (UTC)
I've posted tables of over 10,000 people - born between 1500 and 1850 - matched to the Library of Congess name authority on my talk page. When these tables are complete, adding people born after 1850, I expect around 30,000 more matches. I've also graphed the total number of people on Wikipedia categorized by birth year - over 188,000.) Comments very welcome here or there! Dsp13 10:52, 11 August 2006 (UTC)
Actually de: didn't do quite what you're implying, although I can see why it might be a good idea to. Look at de:Lewis Carroll for an example - the Persondata is all the usual fields (you may have to click "edit" to see this) and the PND link has been added in the "external links" section (much as MacTutor biographies are for maths articles, for instance). TheGrappler 22:43, 8 July 2006 (UTC)
You are quite right. All I should have said is that provision of name authority external links was historically associated with persondata at de:, and might reasonably be expected to be help en: persondata along. I don't have much Wikipedia experience, and hence have no reasoned preference for how name authority links should be added to pages (external links or templates). Dsp13 01:15, 10 July 2006 (UTC)

This does raise an interesting question, however. Would it be technically preferable (while Persondata is still young enough to undergo change without substantial disruption) to include database links inside the PERSONDATA template (e.g. German national library/LOC authority files, or links to other online databases of people - particularly for statistical records for sportspeople) or can those other database entries be mined quite easily without direct inclusion inside the database? TheGrappler 23:38, 11 July 2006 (UTC)

I think that links to specialised online databases (i.e. of particular sorts of people) should not be included within the PERSONDATA template: the fields in the PERSONDATA template should be fields which potentially apply to all people. A template with lots of empty slots is undesirable. If links to many online databases were provided in the template, it would also be (1) hard to get agreement on which online databases should be included there, and (2) problmatic to maintain, since online links always degrade over time (and Wikipedia can't control the way in which external links degrade). The case for links to name authority files to be included in the template is stronger: though the LOC name authority file began as a specialised database (a list of authors, and only secondarily a list of people mentioned in published texts), it has evolved pretensions to be a one-stop name authority file for the English language. As an authority file, it is essentially designed only to provide uris, and gives only such information about an individual as helps in this task (i.e. helps to disambiguate the individual from others). The LOC name authority file seems the natural choice for :en, just as the German library was the natural choice for :de. However, before putting links to the LOC name authority in a template, assurance should be sought that the future of the links is guaranteed. (There are plans to integrate different national name authority files - e.g. German and LOC - in a Virtual International Authority FIle, though I have no idea of the likely timescale for this to be up and running.) Dsp13 02:12, 15 July 2006 (UTC)
As far as I know the LOC does not provide direct linking into their catalouge with a given LOC authority number - they first need to be convinced to make LOC authority numbers useful in en:. For PERSONDATA : have you tried to use German PERSONENDATEN (available at [4], see [5]) and match it via interwikilinks to en:? We now have more than 100.000 entries and many of them are probably also in en:. But you also need a way to extract PERSONENDATEN from en: to create stats and use the date (this could be done via Wikipedia-API for selected aricles). -- Nichtich 13:46, 12 August 2006 (UTC)
Thanks Nichtich that's very clarifying. Do you have any idea what might be involved in convincing LOC (based on your German experience?) It seems curious that an authority file should be maintained without any way of leveraging it. I had thought - even without such provision - that (given the authoritative name form from the LOC authority file) one could use SRU (examples) to find texts by, and only by, the author. But I'm now unsure about even this. Depressing. Dsp13 14:05, 14 August 2006 (UTC)
at the moment the situation is that one can in fact directly link into the LOC catlogue, and use the name form given in the name authority file to do so more intelligently (but not quite perfectly) . Take John Russell, 1st Earl Russell as an example: From his name authority entry, his preferred name form is 'Russell, John Russell, Earl, 1792-1878' which is the concatenation of subfields in his MARC 100 field '|aRussell, John Russell,|cEarl,|d1792-1878'. Forming a search for someone with that name AND that title AND those dates, we can link directly to books by him, or books by/about/associated with him, in the LOC catalogue.

Dsp13 15:09, 15 August 2006 (UTC)

[edit] FYI

In case anyone wants to know, I've been putting persondata into figure skating bio articles. I'm not sure what this will accomplish, but wanted to let someone know that I'm using it. --Fang Aili talk 19:58, 17 July 2006 (UTC)

[edit] Illustration of the effect of viewing Persondata

I'm interested enough to start adding the markup to at least some bios that i work on, but not enuf to repeat the traume of editing my stylesheet for viewing it, w/o seeing the effect first. How about a pretty picture of an example?
--Jerzyt 15:31, 23 July 2006 (UTC)

I've added a screenshot to the page Circeus 17:42, 23 July 2006 (UTC)

[edit] Why is this seperate from Infobox Person?

I'm confused about why both this template and Template:Infobox Person exist. The latter seems to do everything that the former does, and more, plus it has the advantage of being much more noticable than this template. If I'm not missing something, then I would propose that Persondata is merged into Infobox Person (as several people have done above, it seems). Mike Peel 20:58, 24 July 2006 (UTC)

The main thing this template does is allow for more accurate searches based on the LASTNAME, FIRSTNAME format. The person infobox does not allow that. Also, many times there is a more appropriate infobox that is used, such as NFL player or Musician, etc. And some of those templates do not include data such as date/place of death, aliases, etc.--NMajdantalk 21:09, 24 July 2006 (UTC)
And also thefactthatthere are numerous variants of Template:Infobox Person, not all of them using the same parameters. See also Category:People infobox templates, all of which are infoboxes that will usually replace Infobox person where appropriate. Circeus 21:37, 24 July 2006 (UTC)

[edit] Standardized alternate name labels.

In the persondata box for Sofia Kovalevskaya, I listed her name in Cyrillic as an alternate name. I put (cyrillic) after it. Is this right? Should it be (Russian)? (Cyrillic)? Are there, or should there be, guidelines? grendel|khan 22:41, 30 July 2006 (UTC)

When I have to list a foreign spelling, I list it with the language, because if you gave that language's version, it'd be in the alphabet anyway, and other languages using the cyrillic alphabet might spell the name differently. Circeus 01:00, 31 July 2006 (UTC)
Thanks! In the same vein, should Gene Roddenberry be listed as "Roddenberry, Eugene Wesley" with an alternate of "Roddenberry, Gene"? If only the former is listed, "Gene Roddenberry" wouldn't pick up the record, right? But it's not really a nickname; does it even need a label? grendel|khan 12:43, 1 August 2006 (UTC)
I have been following the German Wikipedia's lead on this. For Name, I would put "Roddenberry, Gene" and then "Roddenberry, Eugene Wesley" for the Alternamtive Name. Name is for what most people call them and Alternative Name is for a fuller more complete name. --Rajah 02:25, 3 August 2006 (UTC)

[edit] Updated Count and German Wikipedia Persondata Link Network

Is it possible for us to get a monthly updated count of how many have the Persondata metadata and also which articles don't have it? Also, this link is an analysis of the German Wikipedia entries and how many incoming links various people have. I think this is really interesting and if we could do a similar analysis with the English Wikipedia, it might give Persondata more of a push. --Rajah 02:29, 3 August 2006 (UTC)

[edit] place of death

What should I do for someone who died at sea?--CMG 22:22, 8 September 2006 (UTC)

Give the sea? --maru (talk) contribs 00:31, 9 September 2006 (UTC)

[edit] missing fields


  • For living people, do we just say

or do we say


or do we leave these out altogether?

  • For people who's year of birth is unknown, what do we do?
  • And how about alternate transliterations? Khalid Shaikh Mohammed was probably born in Pakistan, and used Arabic script to write his name, so all names using Roman letters are approxomations. Should ALTERNATIVE NAMES include other transliterations (such as "Khalid Sheikh Mohammad")? If so, should I use just one, with an "et al" or something? For particularly difficult names, there can be dozens of possible transliterations. Check out Mohamed Atta for a more troublesome case.
  • KSM (above) used 27 different aliases. Is what I've done appropriate?

Thanks, – Quadell (talk) (random) 19:12, 21 September 2006 (UTC)

Just leave the date and place of death blank if the person is living. Regard transliterations: Any spellings or transliterations that you think people may commonly use to search for the person should be included under alternative names. Basically, if there is a redirect for it, it should probably be listed. Kaldari 23:05, 21 September 2006 (UTC)

Okay, thanks. But if someone is obviously dead, but we don't know when or where the person died, would we leave these blank as well? If so, then persondata can't be used to determine if someone is alive. – Quadell (talk) (random) 01:04, 22 September 2006 (UTC)

Over at the Vietnamese Wikipedia, I've been placing "?" (without the quotes, of course) in DATE/PLACE OF BIRTH/DEATH if we don't know the information. If we aren't sure, I place "?" after the best guess. I'm not sure what common practice here is, but I've seen several articles that use "living" in the Persondata box. – Minh Nguyễn (talk, contribs) 07:31, 9 October 2006 (UTC)
I've been typing "living" in date of death when the person is alive. If the birth year is c.1922, I either list the year of birth as "c.1922" or "1922?". – Quadell (talk) (random) 22:02, 9 October 2006 (UTC)

[edit] List of highly connected people without Persondata

On one of my subpages, User:Rajah/persondata, is a semi-wikified list of the top 1000 linked people in the German wikipedia. (this is why the list tends to be german centric, but I still think it's a good start as most names on the list are universally known). I've put strikethroughs through the names of those with persondata. Feel free to add persondata to one ore more people and then put a strikethrough their name. I think this way, a lot of the really "important" people can get persondata in a collaborative, feedback apparent fashion. Thank you. --Rajah 06:15, 16 October 2006 (UTC)

[edit] Persondata CSS class

Hi there. There is currently a discussion going on about the CSS rules added to MediaWiki:Common.css for {{Persondata}} at MediaWiki talk:Common.css#metadata. Mike Dillon 22:03, 22 October 2006 (UTC)

I can no longer see persondata information. My monobook.css says "table.metadata {display:table;}", and I'm using Firefox 2.0. I'm not sure if I can't see the data because of Firefox 2 being different than Firefox 1.5, or if someone changed the CSS rules. I tried "table.metadata {display:block;}" (the IE way) to no avail. – Quadell (talk) (random) 12:50, 27 October 2006 (UTC)

[edit] Deleting persondata info

User:Adam Carr left this note on my talk page, after I added persondata to a number of articles.

Yes more ugly and pointless tables and boxes - I will delete these whenever I see them.

What should I do about this? – Quadell (talk) (random) 15:11, 27 October 2006 (UTC)

Apparently, this comes from the fact that he can see the persondata, and he thinks it's ugly. Which is odd, since I can't see it, and I have my monobook.css set up to see it. Can we get the CSS issues worked out soon? – Quadell (talk) (random) 15:29, 27 October 2006 (UTC)
I reverted template:Persondata to use metadata, temporarily. We'll fix it later. – Quadell (talk) (random) 15:48, 27 October 2006 (UTC)

And I just fixed my monobook.css! If .persondata has already been hidden in the main stylesheet, then the problem is probably stylesheet caching. Either way, the best solution for the moment would be to apply both classes to root element as such: class="metadata persondata". I don't think both classes would be needed on the inner elements, just the table. Then, when it is time to remove the metadata class (if that is indeed what we want to do) it can be done without breaking any user stylesheets that switch to persondata.

Does anyone else agree that this is what we should do? —TheMuuj Talk 21:45, 27 October 2006 (UTC)

I have my monobook.css set up that way too. But the thing is, it doesn't seem to be a caching issue. I flushed the server cache, and it didn't change anything. Adam Carr, who has never touched his monobook.css, couldn't see the persondata last week, but could this morning (after the persondata template was switched from using "metadata" to "persondata").
If you look at MediaWiki:Common.css, you can see that the persondata class and the metadata class are identical, so they should work the same. But for some unknown reason, they don't. Any ideas? – Quadell (talk) (random) 05:21, 28 October 2006 (UTC)
It's not the server cache that's the problem. It's the browser aggressively caching stylesheets. Generally, pressing Ctrl+F5 will rectify the situation. But since MediaWiki:Common.css defines both metadata and persondata, putting both classes on the table will serve as a temporary solution. This will let those of us who want the persondata to be visible to go ahead and use persondata in our stylesheets. Eventually, we'll make the switch and hopefully everybody will have an up-to-date stylesheet in their browser cache. —TheMuuj Talk 14:53, 28 October 2006 (UTC)
Applying two styles is a good idea, but some browsers don't support that syntax. It's probably a better idea just to wait until the caches clear. Kaldari 05:43, 2 November 2006 (UTC)
This page has a good discussion of the multiple class issue. It confirms that pretty much every browser out there supports the class="CLASS1 CLASS2" syntax. Where the support is marginal in some cases is using rules with both CSS classes as a selector, but we aren't doing that here. Mike Dillon 15:11, 6 November 2006 (UTC)
Some people are still running around deleting persondata as "cleanup" of articles; it's been done lately with Kellie Pickler and Tiffany (singer). *Dan T.* 20:45, 13 January 2007 (UTC)

[edit] place inside the article

"To use the {Persondata} template, copy the wikitext below to the end of a biographical article. The template should be placed just before the categories and interlanguage links."
I suggest to follow the style guide of the German Wikipedia, which suggests to put the persondata between categories and interwiki links. The main reason is that I've come over a couple of articles where directly above the categories were other templates (which sometimes include categories). Categories and persondata are both meta data, but the persondata are usually not included with other templates (I hope). --32X 11:43, 10 November 2006 (UTC)

That's actually how we had it originally, but we had to change it for some reason. Can't remember why now :P Kaldari 19:29, 10 November 2006 (UTC)

[edit] Range of years

Is it ok to put a range of years for the birthdate, like 1423-1431? 21:42, 14 November 2006 (UTC)

Normally, I think c. (circa) would best fit, and for your example, "c. 1428" would make the most sense. It seems odds for a biographer to know a definitive range of years. Do you have an example of who you are discussing? --Rajah 05:51, 16 November 2006 (UTC)

[edit] Date of birth

Per the privacy discussions at WP:BLP, we should not be listing the date of birth for living, non-public or semi-public figures. We can and should ask for year of birth and can ask for DOB for clearly public figures. Can this template be adjusted or perhaps forked to accommodate the privacy concerns for living semi-public persons? Rossami (talk) 01:51, 17 November 2006 (UTC)

That is not what my reading indicates. The relevant sections seem only to say that sometimes for living persons you might maybe want to consider only listing year. For which concern the current template is fine - one simply omits the month and day. --Gwern (contribs) 02:26 17 November 2006 (GMT)
By "omit", do you mean "ignore" or "take a second step to delete" the line about month and day? My concern is that the base template will be (already has been) added to articles in good faith. New users who are unfamiliar with the privacy concerns see the gap in the template and decide that they will "fix it". They add the full date. Everyone has acted in good faith but now we have content that policy says we probably shouldn't have on the page. The template could have the effect of unintentionally luring good users to adding content inappropriately. Can the template be adjusted to raise awareness about the privacy concerns? Rossami (talk) 16:48, 18 November 2006 (UTC)

[edit] Use?

Has persondata been of any use thus far? Is WP:1.0 using it? I'm just wondering if persondata has benefited some application. I noticed the German WP uses it far more than we do. What have they used it for? 20:27, 19 December 2006 (UTC)

It's useful for database purposes, since it is the only standardized template of that type and scope. --PhantomS 21:07, 19 December 2006 (UTC)

[edit] So What Does This Do Now

Is it proving to be of any use at all on the English Wikipedia right now. I mean, is there any way to look at a database, if there is one, that isn't all script and whatnot? Or are we just starting to add something to pages that will be made eventually? Kaiser matias 09:34, 20 December 2006 (UTC)

It's usually harder to find the birthplace of someone in the en.WP than on de.WP. The box shows that data pretty clear. (Yes, there are articles where the birthdate is mentioned several times.) --32X 10:57, 20 December 2006 (UTC)
Not to forget, it was used for the German DVDs. But that means there's a need for a bigger base. It doesn't really work if there are only some thousand articles with persondata compared to some hundred thousands without it. --32X 04:43, 22 December 2006 (UTC)

[edit] Placenames

Should these be historically/contextually accurate, or simply follow common modern usage (as we normally do)? For instance, if someone was born in modern-day Seoul in the 1920s, should I enter the birthplace as "Seoul, South Korea" (the modern location)? Or "Seoul, Korea"? Or "Gyeongseong, Korea" (probably the most contextually-appropriate)? Or "Keijo, Chosen"? Just curious -- it would be helpful to have some guidance on this (and other standardization issues) before starting intensive tagging work. -- Visviva 12:36, 1 January 2007 (UTC)

This was asked previously and didn't get much of a response. At the time, I went with contextually accurate. I guess I still feel that way. Mike Dillon 16:43, 1 January 2007 (UTC)
Whenever possible I try to give original names. F.e. when someone was born (or died) in Berlin in the 1950s-1980s era, it's important to know in which part of Berlin that was. --32X 14:28, 4 January 2007 (UTC)

[edit] Category sorting key

I propose adding the following code:


It will automatically set proper category sorting key, see Signpost news for details on how it works. MaxSem 12:08, 6 January 2007 (UTC)

While the DEFAULTSORT is a nice feature, I don't see why it should go to persondata when it's (to my knowledge) only used for categories. There are even some problems (at least I haven't read something different): Gerhard Schröder
|NAME=Schröder, Gerhard
[[Category:Chancellors of Germany|Schroder, Gerhard]]
I guess the defaultsort would change it to [[Category:Chancellors of Germany|Schröder, Gerhard]] and therefor he'd be sorted after the Chancellor of 2020, "Schruder, Günther". --32X 12:35, 6 January 2007 (UTC)
DEFAULTSORT works for everything categorizable, and does not override explicitly set sorting keys, so it will not affect mr. Schröder. However, some of your concerns are valid, and let's wait for more input. MaxSem 12:48, 6 January 2007 (UTC)
Ah, yes, right. --32X 19:21, 10 January 2007 (UTC)

I thought again about that request.

  • extra motivation adding persondata
  • Works for the style of en.WP: persondata are placed before categories and therefor a following DEFAULTSORT wouldn't be a problem.
  • [[:Category:2010 births|15th Dalai Lama]] isn't influenced in any way.


  • a DEFAULTSORT before persondata will be overwritten/ignored
  • It's like hiding functionality in some obscure tag, it shouldn't be used.

In my opinion, the pro arguments are pretty good reasons to include DEFAULTSORT. If the contra arguments become a real problem there's still the possibility to remove DEFAULTSORT from persondata and include it by a bot to each affected article. --32X 19:21, 10 January 2007 (UTC)

Another example of when DEFAULTSORT is different than the NAME is for celtic names, for example McBlah should have a DEFAULTSORT of Macblah, so that Mcs and Macs are listed together in their category. A couple of editors have started making these changes, for an example see Rove McManus. So will DEFAULTSORT default to NAME, and can then be overridden as a separate entry before the categories? You'll have to excuse my limited knowledge of templates!--Steve (Slf67) talk 02:48, 22 February 2007 (UTC)

The name sorting rules on English Wikipedia don't provide for sorting Celtic names together, so this shouldn't be done. See Wikipedia:Categorization of people#Ordering names in a category, which states "People with multiple-word last names: sorting is done on the entire last name as usually used in English, in normal order" (emphasis mine). Mike Dillon 03:40, 22 February 2007 (UTC)
These are not multiple word last names. They are Celtic based names that are always grouped together in any list, for sorting purposes surnames; Mc, O', and Fitz should be categorized as, for instance, Macmanus (not McManus), Oneill (not O'Neill) and Fitzwilliam (not FitzWilliam). This is just another example where category sorting is different from the article name, and the guidelines need updating to reflect this. --Steve (Slf67) talk 07:33, 22 February 2007 (UTC)
I understand that. However, sorting them together is not an English-language convention, it's a Celtic-specific convention. Ordinary readers of English will expect to find "McManus" after "MacManus" and "Martin" if they don't have a background in Celtic languages and the sorting should be done in such a way as to violate expectations for the smallest number of readers. The spirit of the name ordering guideline is not specific to multiple word last names, but to the expectations of everyday readers of English. That being said, I agree about the apostrophe and capitalization case for "O'Neill" and "FitzWilliam", since that is more akin to sorting Márquez as "Marquez". Mike Dillon 14:50, 22 February 2007 (UTC)

[edit] Template:Pharaoh Infobox

The {{Pharaoh Infobox}} template is including {{Persondata}} at the top of pharaoh articles... Mike Dillon 00:46, 13 January 2007 (UTC)

yes, we need to resolve this. I asked on the Pharaoh infobox talk page that the Persondata template be removed from the Infobox. We need to follow up on this. --Rajah 22:05, 1 April 2007 (UTC)

[edit] Template:Persondata edit request

Could a sysop please add the line <!-- Metadata: see [[Wikipedia:Persondata]] --> to the usage section on {{Persondata}}, right after the <pre> tag? This would make it in line with the example given in the Wikipedia:Persondata#Using the template section of this page, and would make it easier for those who don't know about this system to figure it out. Thanks. Picaroon 04:25, 14 January 2007 (UTC)

Done. Luna Santin 19:39, 15 January 2007 (UTC)
Thanks. Picaroon 20:40, 15 January 2007 (UTC)
Adding the comment to the template doesn't actually do anything as the comment is not viewable either in the article view or the editing view. Perhaps it would be useful to add an actual note into the template that is not an HTML comment. Kaldari 23:18, 24 January 2007 (UTC)
An HTML comment is the only way to handle it. Persondata is not visible; it's just a textual note within the window as to what it is. Ral315 (talk) 00:35, 25 January 2007 (UTC)
My point is the HTML comment is only useful if it is outside the template, rather than inside. If it's inside the template, you'll never see it since HTML comments in templates are not displayed in editing mode. Thus the recent edit to the template should be reverted. Kaldari 02:52, 25 January 2007 (UTC)
Compare it yourself: before after
It's useful for copy/paste. Editors who are unfamiliar with {{persondata}} know where to have a look at for more information (because HTML comments _are_ visible in edit mode). --32X 05:07, 25 January 2007 (UTC)
My mistake. I thought the usage notice had been added to the template itself. Kaldari 18:14, 25 January 2007 (UTC)

Hi Kaldari. Could you edit the template to link to Template:Persondata/doc, following the template doc page pattern? Mike Dillon 18:46, 25 January 2007 (UTC)

Where's the actual benefit of it? The doc page contains less information. --32X 22:22, 25 January 2007 (UTC)
I'm not sure what you're asking, but the benefit over the current situation is that the doc portion will be editable by anyone while keeping the template itself protected. Mike Dillon 22:53, 25 January 2007 (UTC)
Ok, that's an argument. But wouldn't it be better to set a redirect to Wikipedia:Persondata since that page is all about the template? (If one knows about the template, the short form for copy/pasting is enough; otherwise the introduction is a "must read".) --32X 23:41, 25 January 2007 (UTC)

[edit] Siblings and parents

Can we add siblings and parents as a cat? That way if the info is removed from the article, at least the info will be easily found by those who need the info. The info doesn't have to display, but its a good place to store it. The biography infobox has this information but it displays all answers. This way the info could be not displayed and still be available for researchers. Answer at my page please. --Richard Arthur Norton (1958- ) 20:45, 24 January 2007 (UTC)

This seems like a bad idea; persondata has been standardized for the most part. Ral315 (talk) 23:37, 25 January 2007 (UTC)

[edit] hCard microformat

It should be relatively trivial to arrange to have "Persondata" published with hCard microformat mark-up, simply by applying some standard class names to its containing elements. The data coudl then be extracted by a variety of parsing tools. Please see also Wikipedia:WikiProject Microformats Andy Mabbett 20:59, 28 January 2007 (UTC)

[edit] Project proposal to link to WorldCat Identities

Anyone interested in a proposed project to link to WorldCat Identities is invited to leave comments or sign up at the project proposal page. WorldCat Identities provides pages for 20 million 'identities' (authors and persons who are the subjects of published titles in WorldCat). Several thousand of these pages provide links to Wikipedia biographical pages: providing links in the other direction would allow readers of Wikipedia biographical articles to move straight to associated library information held in WorldCat libraries. Dsp13 15:17, 20 February 2007 (UTC)

[edit] Template:Birth date and age

For the "Date of Birth" parameter, should we use {{birth date and age}} or should we stay clear of this? --WillMak050389 01:10, 5 March 2007 (UTC)

I would avoid it. Any application using Persondata is likely to be working with the wiki-text directly, which means it will see {{birth date and age|1967|07|15}} rather than July 15, 1967 (age 39). The idea with Persondata is to make it easier for automatic extraction of data; either of these is yet another format your parser has to handle. In any case the age is more useful to human readers; given the birthdate any program can easily calculate the age. Dr pda 01:39, 5 March 2007 (UTC)
Thanks, I wasn't sure, but this makes sense. I'll change the ones I've edited. --WillMak050389 01:43, 5 March 2007 (UTC)

[edit] Half-automatic tagging with persondata-tool

I come from the german Wikipedia. At January 24th 2007 126.332 from 133221 (94,8 %) persons are tagged with persondata. A very useful utility is the persondata-tagging-tool from Apper. It extracts automatically birthdate, birthplace etc. from the article and the only thing the user has to do is to check if it's correct and then save it. If someone of your project asks him, maybe he will help you with his tools so you can tag your articles much easier and faster. Bones 22:57, 15 March 2007 (UTC)

I'm actually almost finished writing a script to do a similar thing, although it requires the article to have an Infobox from which the data is then extracted, rather than getting the data from the lead of the article. However there are still around 50 000 articles using one of the top 20 or so people-infoboxes (e.g. {{Infobox Football biography}}, {{Infobox musical artist}}), which is about ten times the current number of articles with persondata.
It is more difficult to extract the information from the text of the article (i.e. without an infobox) compared to the de wiki, since on the en wiki the birth/death places are typically not given in a predictable place, i.e. the opening sentence. Compare the first sentences of de:Alfred Hitchcock and Alfred Hitchcock
  • Sir Alfred Joseph Hitchcock KBE (* 13. August 1899 in Leytonstone; † 29. April 1980 in Los Angeles) war ein Filmregisseur und Filmproduzent britischer Herkunft.
  • Sir Alfred Joseph Hitchcock, KBE (August 13, 1899 – April 29, 1980) was a highly influential film director and producer who pioneered many techniques in the suspense and thriller genres.
Hopefully I will have time this weekend to get the script finished. Dr pda 01:27, 16 March 2007 (UTC)
OK, I've finished the script now. Instructions for use are at User talk:Dr pda/persondata.js. It also includes a tidied-up version of the javascript above for turning persondata on/off without editing your monobook.css. Sample results of using the script are here.
This is a very nice tool - thanks! However, at present it seems to insert the persondata at the top of the article, rather than before categories. No, sorry, it puts everything in the right place! Dsp13 12:17, 19 March 2007 (UTC)
Or rather, it puts the persondata in almost (but not quite) the right place whenever there is a defaultsort template introducing the categories - see my query below. Dsp13 21:51, 19 March 2007 (UTC)
By the way I've also got the extraction from the XML dump more-or-less working by modifying the scripts linked at WP:PDATA#Extraction from the XML dump (the last step is deciding whether to write code to parse the dates which are currently giving errors, or just change the data in the article). I don't have an appropriate place to put the scripts on the web, but if anyone wants a copy email me. Dr pda 01:50, 19 March 2007 (UTC)
User:SEWilco has left this plea, which sounds reasonable, on my talk-page: 'Please do not have your script call itself "this script". That makes reading and searching edit summaries much more difficult.' Could a simple alteration to the script be made? Dsp13 09:14, 1 April 2007 (UTC)
I've changed the edit summary; it now reads adding persondata using User:Dr pda/persondata.js. I'm not entirely convinced the previous edit summary was difficult to read (compare 'reverted vandalism using popups', 'renaming category per CFD with AWB' etc); anyone interested in knowing which script would click the link, anyone not interested would just be able to see it was done with a script. As for causing difficulty in searching through edit summaries, there should only be one instance of it in an article's history. Users of the script will need to refresh their monobook.js to pick up the change. Dr pda 12:24, 1 April 2007 (UTC)

[edit] Query re positioning of persondata before categories

Where categories are immediately preceded by a Template:DEFAULTSORT, should the persondata go between the defaultsort template (which seems the strict reading of 'immediately before categories', but confusingly splits the defaultsort template from the categories it is concerned with) or immediately before the defaultsort template (which seems more natural to me, but should be specified if that is what is to be recommended)? Dsp13 12:33, 19 March 2007 (UTC)

In my opinion {{DEFAULTSORT}} is not a real but a meta-template which directly belongs to categories. I don't see the problem here, but to avoid any confusion I've added a comment. --32X 21:19, 19 March 2007 (UTC)
Thanks. I've modified the script to place the persondata before the {{DEFAULTSORT}} template if it exists. You may need to refresh your monobook.js to pick up the changes. Dr pda 23:04, 19 March 2007 (UTC)

[edit] If you see someone removing persondata templates... can now tell them not to do it again, by putting {{subst:pdataremove-warn}} on their user talk page. They will also be pointed here for more information on persondata. Resurgent insurgent 03:33, 25 March 2007 (UTC)

[edit] Why is persondata separate to infobox?

Further to my above comment about hCard, please can someone explain to me the purpose and advantage of having persondata in a separate, hidden-by-default table instead of having the same, standard fields in the output of the various infobox templates? What tools exist to parse persondata, inside or outside Wikipedia? Andy Mabbett 00:59, 26 March 2007 (UTC)

The {{persondata}} isn't a real information box but meta data. It was introduced for the first DVD of the German Wikipedia. The data field is pretty easy accessible with direct SQL (when you have downloaded an image) and therefor allows search operations. With a large article base (more than 100,000 in de.WP) it allows you to do SQL operations like f.e. to search for articles of birth places which aren't written yet. Some time ago I've read about several tools, but because I didn't felt the need I haven't used them. --32X 18:50, 28 March 2007 (UTC)
Thank you for the explanation. The use-case makes sense, but it seem to me that this could be achieved just as easily, by using hcard, and hCard-like classes, in infoboxes, instead of repeating the information separately; and that that would have additional advantages for readers and editors, through greater interoperability with other tools and websites and ease of authoring. It would also facilitate persondata-like metadata for organisations and venues, though their infoboxes. I'm happy to advise further, if anyone's interested in pursuing this possibility.Andy Mabbett 19:16, 28 March 2007 (UTC)
To clarify issues in my own mind, I've drawn up a comparison of persondata and hCard properties, on the microformats wiki. Andy Mabbett 19:49, 28 March 2007 (UTC)
A good reason is that someone using Pesondata usually has read this page and knows what they're doing. It is far more common for people to mess up and misuse infobox, which would garble the metadata.Circeus 19:03, 28 March 2007 (UTC)
Like any bad edit, surely that can be remedied? Andy Mabbett 19:16, 28 March 2007 (UTC)

The issue of persondata vs infoboxes has been raised several times on this talk page, see #Use inside implementations of other templates, #Not picked up by Google?, #Hidden Metadata, #Revisiting Infobox Person and #Why is this seperate from Infobox Person?. Some of the main arguments given against combining them are

  • This would require every biography to have an infobox, which many editors are opposed to.
  • There are a large number of different infoboxes (approx 160), not all of which have all the fields of persondata, and which currently vary greatly in the names for the fields they do have.
  • Persondata takes names in the format surname, firstname in order to be able to create an alphabetical list by surname.

There are examples at WP:PDATA#Extraction of persondata of how to extract persondata from an SQL database, or scripts to extract and parse it from the WP XML dump and insert it into a mySQL database, on which you can then run all kinds of queries (these scripts are written for the de wiki but I have more or less adapted them to the en wiki following the hints there, see my comments above).

I notice that your comparison of infobox/persondata/hCard at the microformat wiki is expressed in terms of the rendered (X)HTML of the page; both the previous methods for extracting persondata work with the raw wiki markup, i.e DATE OF BIRTH = 22 May 1977 rather than <abbr class= "dday" title="1977-05-22">22 May 1977</abbr>. Using hCard would then seem to imply a lot of HTML-scraping to get the data, rather than using the periodic database dumps. (there are over 200,000 biographies, though admittedly only a quarter or so have infoboxes and only about 7000 currently have persondata.) Looking at the list of hCard implementations here it seems that most of these implementations deal with recognising hCards on an individual webpage/converting to vCards/adding to address books etc, rather than dealing with large collections of hCards (which would be the end goal of an equivalent to persondata), although I suppose some of the PHP tools could also be used to populate a database. I also notice that hCard does not yet support the date of death and place of birth/death fields, which would seem to argue against its immediate implementation in place of persondata. Perhaps the best way of combining persondata with hCard (if you want to go there at all) would be, as you originally suggested, adding extra class tags in the persondata template itself. Dr pda 15:10, 31 March 2007 (UTC)

Thank you for your detailed response. I appreciate that this must be old ground for some people, but I trust that you will agree that consideration of microformats makes it worth revisiting/ I'll address your points as bullets, for the sakes of convenience and clarity:

  • "This would require every biography to have an infobox, which many editors are opposed to" - I would question why they're opposed, and whether they're perhaps putting personal (aesthetic?) preferences before the convenience of users. That said, perhaps, one day, it might be possible for user preferences to include a "do not display infoboxes" option, like the current "do not show TOCs" option.
  • "There are a large number of different infoboxes (approx 160), not all of which have all the fields of persondata, and which currently vary greatly in the names for the fields they do have" - I think there's a case for some standardisation here; perhaps a root "persondata" template, to be included in other biographical infobox templates, in the same way that "coor" is included in a number of other location- related infoboxes.
  • "*Persondata takes names in the format surname, firstname" It's possible for software to convert for one format to the other; or for the data entry to be in to (or more) fields (there's experience of doing this for the name field in hCard).
  • It should be possible for XML to be dumped from infoboxes/ hCards if required.
  • it seems that most of these implementations deal with recognising hCards on an individual webpage" - most, but not all, and thee just the "early adoptions" there's - deliberately - plenty of scope for other use cases.
  • I also notice that hCard does not yet support the date of death and place of birth/death fields" - yes but the comparison page you cite suggests a work-around for that.
  • adding extra class tags in the persondata template itself" hCards (indeed, all microformats) are intended for data that is visible on the page; not for hidden metadata

Finally, being naturally lazy, I believe strongly in both not reinventing the wheel, and not doing work (i.e. entering data) twice.

Cheers, Andy Mabbett 19:33, 31 March 2007 (UTC)

P.S. Even while I was typing the above, The Anome was adidng, on the Microformats Project talk page:

This a bootstrapping effort at the moment, and you won't see any extra utility in the very short term: but once there's a substantial amount of semantically-tagged content on Wikipedia, some very interesting things will start to happen...

Andy Mabbett 19:39, 31 March 2007 (UTC)

[edit] Persondata box & succession box display

In the case of Victor Hugo, the displayed persondata box gets mixed up together with an immediately preceding succession box. Anyone know why, or how to fix it? Dsp13 12:12, 31 March 2007 (UTC)

There was a missing {{end box}} template after the succession box. It's fixed now. Dr pda 15:10, 31 March 2007 (UTC)

[edit] Gregorian/Julian calendar shift

How best to handle old-style dates? At the moment with Samuel Johnson I've left a template for old-style dates in his birth year, but (per discussion of dates above) I'd rather leave something more transparent in the wikitext. Dsp13 12:54, 31 March 2007 (UTC)

[edit] Transcluded persondata?

Ramesses II has persondata somehow 'transcluded' onto the page. I'm not quite sure how this works, or it it's desirable. Any thoughts? Dsp13 21:28, 1 April 2007 (UTC)

It is because the Pharaoh Infobox contains the persondata template. Wikipedia_talk:Persondata#Template:Pharaoh_Infobox --Rajah 23:25, 1 April 2007 (UTC)