Wikipedia:Request a query/Archive 5 Source: en.wikipedia.org/wiki/Wikipedia:Request_a_query/Archive_5

This is an archive of past discussions on Wikipedia:Request a query. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page.

Archive 1

←

Archive 3

Archive 4

Archive 5

Links in common between two pages, with a twist

I realize PetScan can be used to show links that appear on two pages, but I have a need for it to show such a result sorted by the order in which they appear on the first page. Based on my limited understanding of the database, I don't think the order of the links on a page are tracked. But just in case I'm wrong, can someone show me a query to do this? Or suggest an alternative approach? Thanks. Stefen Tower_{s among the rest!} ^{Gab • Gruntwerk} 23:49, 22 June 2024 (UTC)

No, that's not possible; Quarry doesn't contain information about the content of the page.

Could you give some additional information about what you want this for? There are several alternative approaches that would work, but I need to know a little more about what you want it for. BilledMammal (talk) 23:53, 22 June 2024 (UTC)

I thought Quarry knows the links on a page, so that's where I was coming from. Anyway, I have a page that shows links sorted by popularity (views, descending) and another page that shows links to articles having old issues. I'd like an intersection of the links between them, in order of popularity (order of appearance on first page). Stefen Tower_{s among the rest!} ^{Gab • Gruntwerk} 00:03, 23 June 2024 (UTC)

It can see which pages a given page links to (or is linked from), but doesn't have any information on what order it's done in - that would need the page content.

Is this a one-time query, or will you need it repeated? —Cryptic 00:08, 23 June 2024 (UTC)

I'd like for it to be repeatable as the underlying pages will change. I'm fine with having to run it manually. Stefen Tower_{s among the rest!} ^{Gab • Gruntwerk} 00:10, 23 June 2024 (UTC)

(edit conflict) It does, but it doesn't have information beyond that, such as about the text of the page.

I'm not aware of any tools that can help you with that, but I threw together the information you wanted using a quick and dirty script:

Extended content

BilledMammal (talk) 00:23, 23 June 2024 (UTC)

I see you want something that can run repeatedly. I don't have time right now to put something together for you, but if Cryptic doesn't come up with something I'll do it sometime in the next couple of weeks - if I don't, feel free to remind me on my talk page. BilledMammal (talk) 00:24, 23 June 2024 (UTC)

Thanks for the list - that's good for a start. Would you mind giving me a few clues on your approach for the script you did? It might snap me into figuring it out. Also, I have thought of using a spreadsheet or text compare software, but was hoping for an on-wiki or otherwise online approach. Stefen Tower_{s among the rest!} ^{Gab • Gruntwerk} 00:40, 23 June 2024 (UTC)

A few Regex operations to get just the links in order, and then a basic python script that works down the first list and if the item exists on the second outputs it. Unfortunately, nothing online ATM. BilledMammal (talk) 00:49, 23 June 2024 (UTC)

Thanks for the tips! Stefen Tower_{s among the rest!} ^{Gab • Gruntwerk} 00:50, 23 June 2024 (UTC)

Any pure-sql-against-the-wmf-databases approach would have to start with something similar to either "Make a page in your userspace with redlinks to 1!Tom Cruise, 2!Muhammad Ali, ... 1000!Frank Torre" or "manually edit this stupidly long query that includes all that data" (like how quarry:query/81948 includes the namespace names, but with a thousand items instead of 30). —Cryptic 01:04, 23 June 2024 (UTC)

Indeed, those don't seem like tenable approaches. But this discussion has helped me figure out a solution, not optimal but workable:

Copy popular articles wikitext into a text editor, and break down into a flat list using a macro I constructed with regex and other tricks.
Use PetScan to create a flat list of articles with old issues (this didn't have to be in any particular order).
Insert both flat lists into their own column in a spreadsheet, then find matches of first column entries in the second column, then apply a filter of matches, and voila.

Stefen Tower_{s among the rest!} ^{Gab • Gruntwerk} 04:18, 23 June 2024 (UTC)

Redirects with Possibilities

Any way for someone to provide me a list from Category:Redirect-Class Green Bay Packers articles of redirects with possibilities ({{R with possibilities}}). Thank you! « Gonzo fan2007 (talk) @ 19:19, 27 June 2024 (UTC)

My first thought was WP:PETSCAN, but it looks like Category:Redirect-Class Green Bay Packers articles is on talk pages, whereas Category:Redirects with possibilities is on article pages. So yeah, will need an SQL query. Will see if I can whip something up. Testing note: 1994 NFC Wild Card playoff game (Detroit–Green Bay) meets the criteria and should appear in the result set. –Novem Linguae (talk) 19:46, 27 June 2024 (UTC)

Got it. Here you go. quarry:query/84446. –Novem Linguae (talk) 19:58, 27 June 2024 (UTC)

Thank you Novem Linguae! « Gonzo fan2007 (talk) @ 21:07, 28 June 2024 (UTC)

First edit

In 2013, I heard that about a quarter of newcomers' first edits were to create a page, and three-quarters were to edit an existing page. (My own first edit was to a Talk: page, which was a distinctly unpopular choice.)

For editors who made their first edit during 2023, is it possible to find out how many of those first edits were page creations vs editing existing pages, and which namespaces they happened in? I imagine a set of numbers like this:


Namespace	New page	Existing page
Article	5339	24209
Talk	249	3590
User	1934	781
User talk	429	9045
Draft	1930	89

all adding up to 100% of first edits. WhatamIdoing (talk) 06:06, 29 June 2024 (UTC)

Lots of ways for this to go wrong. Consider deleted edits, and that there's no way to reliably tell where a page was when an edit was made to it (unless it's also in the page creation log, but that still means at least half the data is bad). Also no way to tell which of two deleted edits with the same timestamp was actually earlier, but a user making a second edit in the same second as their first is going to be rare enough that we can just pick one. Getting a list of users whose first edit was in 2023 is impractical; a list of users created in 2023 and have made at least one edit is at least close, so that's what I'm doing. quarry:query/84486 should finish in about 15 minutes. It may or may not be right - I haven't seen the results yet - but I've got to get to bed. Will take another look tomorrow. —Cryptic 07:47, 29 June 2024 (UTC)

Hm, corrected query finished a lot faster than the obviously-broken ones I ran before (which is usually to be expected, but I didn't think the data would cache that well). Something's not quite right - there shouldn't have that row with the completely blank namespace and 52458 existing-page edits, for starters, and that one creating a page in the MediaWiki namespace looks really suspicious - but otherwise the numbers seem at least plausible. —Cryptic 07:59, 29 June 2024 (UTC)

Pretty sure I know why the blank namespace happened, should be fixed when it completes again. The first-edit-was-to-namespace-8 was at MediaWiki:Campaigns-event-discovery-survey-question. —Cryptic 08:15, 29 June 2024 (UTC)

If we ran that for, say, last week, would that give us an estimate of how many first edits are being 'lost' or 'misplaced' due to deletion and moving pages? WhatamIdoing (talk) 03:34, 30 June 2024 (UTC)

Well, I can run the numbers (quarry:query/84512), but I don't know how much to infer from the comparison with the longer time period. The only edits that are going to be 'lost' are ones that have been revdeleted or oversighted, and in most cases - other than, perhaps for copyvio revdels - that will happen pretty quickly after they're made. I wouldn't care to guess how quickly a typical page gets moved between namespaces after creation, either. —Cryptic 08:17, 30 June 2024 (UTC)

Does it only lose revdel edits, and not ordinary/whole page deletion?

A lot of deletions for WP:UGLY articles happen after draftification, and borderline articles often don't get draftified for a couple of weeks. WhatamIdoing (talk) 01:00, 1 July 2024 (UTC)

Just revdelled and suppressed edits. Everything about deleted pages and edits except for edit summaries and page text is in the public database replicas; accounting for those is most of why the query is as complex as it is. —Cryptic 10:52, 2 July 2024 (UTC)

So here are a few things I notice:

Last week, 4,754 newbies made their first edit on an existing page and 1,842 newbies made their first edit to a new page.
Last week, if your first edit was to an existing page, then about 90% of the time, it was to the [i.e., a page most recently in] mainspace.
Last week, if your first edit was to create a new page, then about 70% of the time, it was to the User: namespace and about 20% of the time, it was to the Draft: space.
Within the first week, about a third of User: space pages and a third of Draft: space pages get deleted.
In 2023, 342,087 newbies made their first edit to an existing page and 144,181 newbies made their first edit to a new page.
In 2023, if your first edit was to an existing page, then about 90% of the time, it was to the mainspace.
In 2023, if your first edit was to create a new page, then about 70% of the time, it was to the User: namespace and about 20% of the time, it was to the Draft: space.
Within the last 18 months, looking at pages created on the first edit during 2023, about a third of User: space pages and almost 95% of Draft: space pages were deleted.
- Conclusion from last week vs 2023: If your User: space page is going to get deleted, it'll happen in the first week. Draft pages are much more likely to get deleted, but it takes longer.
Within the last 18 months, only 1.6% of pages created as the account's first-ever edit were moved to the mainspace. 20% of those were also subsequently deleted. One in 75 first-edit page creations are still visible in the mainspace.
- Actual User: pages probably aren't meant to get moved to the mainspace, and the existence of these pages depresses the overall 'success' numbers. An unknown proportion User sandbox pages probably are meant to move to the mainspace, while others are probably meant to be used for test edits. We don't have a good way to differentiate between these two types of user space contribution.

WhatamIdoing (talk) 17:17, 2 July 2024 (UTC)

Within the last 18 months, looking at pages created on the first edit during 2023, about a third of User: space pages and almost 95% of Draft: space pages were deleted. - this is a shocking statistic but if I understand correctly it can't be taken at face value because the namespace is where the page is now, right? That is, 95% of drafts that were never moved to mainspace are deleted, which is explicable in terms of G13 (I assume the 5% is drafts less than six months old?) – Joe (talk) 20:35, 2 July 2024 (UTC)

@Joe Roe (mobile), all of the drafts created in 2023 are more than six months old at this point. There were probably one or two drafts that were a few hours short of being six months old when the query was run, so it's like 0.05%. The 5% that have been retained were probably created later in the year and had an edit made (by anyone) since then. For example, the 5% probably includes articles that were submitted to AFC in December, declined in January, and will be deleted – just not quite yet.

The namespace listed is the namespace in which either the page exists now (more specifically, at 30 Jun 2024 08:21:10 UTC) or where the page existed at the time that it was deleted (e.g., ordinary CSD or AFD, not revdel or OS/suppression – those latter ones are invisible to the query). WhatamIdoing (talk) 21:14, 2 July 2024 (UTC)

Articles that are missing in another language

I'm hoping to get a list of articles that are tagged as being in Wikipedia:WikiProject Climate change and that do not exist in the Japanese Wikipedia. Any help from the lovely volunteers here would be very much appreciated. Cheers, Clayoquot (talk | contribs) 15:03, 16 July 2024 (UTC)

This can't be done with a sql query - interwiki links aren't in the main database, and jawiki is on a different database server than enwiki. Someone more familiar with WP:PetScan than I might be able to get a result with it. Also worth asking at d:WD:RAQ; they do have access to the interwikis over there, and I seem to recall there's a way to get at wikiproject ratings as well. —Cryptic 17:13, 16 July 2024 (UTC)

interwiki links aren't in the main database - aren't they in langlinks? * Pppery * _{it has begun...} 17:20, 16 July 2024 (UTC)

Erm. Wow. That's egg on my face. I'm sure I've looked for them before, and had thought that table went away when language links were migrated to Wikidata. Hang on. —Cryptic 17:25, 16 July 2024 (UTC)

quarry:query/84886. —Cryptic 17:34, 16 July 2024 (UTC)

Amazing! Thank you so much Cryptic and Pppery! Clayoquot (talk | contribs) 19:43, 16 July 2024 (UTC)

Find pages removed from a category

Is it possible to find a list of changes to a category's members, that are older than RecentChanges? Asking for Wikipedia:Categories for discussion/Log/2024 July 5#Category:Athletes by location in Greece. — Qwerfjkl talk 15:42, 17 July 2024 (UTC)

I'm pretty sure not, unless it was depopulated by a bot, or was old enough that you can find the cat page with the Wayback Machine. —Cryptic 19:27, 17 July 2024 (UTC)

Composition categories without templates

Hi all, if possible, I'd like to request a list of any subcategories in Category:Compositions by composer (which have more than 2 entries) in which there is no corresponding navigational box created for said composer's works.

So essentially, I'd be looking for categories like Category:Compositions by Example Person which have more than 2 pages, but there is no corresponding {{Example Person}} that exists. I'd then go through the list and create the missing templates myself.

Thanks – Aza24 (talk) 22:35, 28 July 2024 (UTC)

quarry:query/85143. —Cryptic 23:24, 28 July 2024 (UTC)

At least one false negative that I noticed by accident: Category:Compositions by Dmitry Bortniansky isn't on the list despite Template:Dmitry Bortniansky not existing, since it only has one member, Category:Operas by Dmitry Bortniansky, even though that category has two members. Trivial to rerun the query without the minimum-number-of-members constraint, while still showing a count of members (and a list of the members' namespaces) if you want. —Cryptic 23:31, 28 July 2024 (UTC)

Thank you so much @Cryptic, that should work well. Question: will this link stick around or should I copy the results somewhere for my own use? Aza24 (talk) 00:24, 29 July 2024 (UTC)

It'll stay in place (and the results won't change even when the underlying data here does). It would probably be convenient to copy it into a sandbox page, though, so you can click on the links directly. The cyan "Download data" dropdown has an option for wikitable format. —Cryptic 00:29, 29 July 2024 (UTC)

Yes, good point! Thanks again. Aza24 (talk) 04:28, 29 July 2024 (UTC)

Find duplicated file licensing templates

Hello, I'd like a query for every file that use any of these templates twice (the same template twice, not one usage of one template and one usage of another). For example File:Madonna Frozen Sickick.png should be one of the results. Thanks! Jonteemil (talk) 13:47, 27 July 2024 (UTC)

Not possible with a query: the table showing transclusions doesn't have duplicates (it's the same one used to generate Special:Whatlinkshere), and the replicas don't have the page text. Best I can think of is using search one template at a time, like so. —Cryptic 18:58, 27 July 2024 (UTC)

Okay, thank you. Jonteemil (talk) 15:04, 7 August 2024 (UTC)

Heavy uses of Template:IETF RFC

I'm interesting in finding the articles that make heavy use of Template:IETF RFC. Would it be possible to rank the top-10 articles with the most calls ("transclusions"?) of that template, please? Thanks! fgnievinski (talk) 01:20, 15 August 2024 (UTC)

Not here, for the same reason as #Find duplicated file licensing templates above. —Cryptic 01:30, 15 August 2024 (UTC)

It might be feasible to do it manually, though - there's only 817 articles that transclude it. You could scrape the wikitext for all of them, parse them for transclusions, and count those. There's only a couple redirects to confuse things, though if there's any indirect transclusions (i.e., through a different template), that'd make things harder. —Cryptic 01:35, 15 August 2024 (UTC)

Understood, thank you. I've asked around:

[1]
[2]

fgnievinski (talk) 02:35, 15 August 2024 (UTC)

Both of those projects are almost certainly pulling their data from the same place, hence with the same limitations. —Cryptic 03:01, 15 August 2024 (UTC)

Thanks for the heads-up. I asked because I'm not fluent in Wiki scrapping. fgnievinski (talk) 03:22, 15 August 2024 (UTC)

As above, search can sort of do this. this looks for "{{", zero or more spaces, "ietf", any single character, then "rfc", case insensitive, with all of that occurring at least forty times in the same mainspace page (both redirects to {{IETF RFC}} with any transclusions at all match that pattern). It finds 13 articles. It could very well be missing some, though; for example, {{IETF RFC|1234}} with extra spaces in the middle would render normally but not be searchable by this. —Cryptic 03:34, 15 August 2024 (UTC)

Wow wow, that's awesome, many thanks!!! fgnievinski (talk) 04:17, 15 August 2024 (UTC)

AfD and article deletion statistics

Hello lovely volunteers. I'm trying to calculate 1) The number of articles deleted in 2022, and 2) The percentage of article deletions in 2022 that were the result of an AfD discussion. I think existing queries do this but would appreciate it if someone could check my interpretation of the queries.

Here are my assumptions:

My definition of an "article" is a mainspace page that is not a redirect. I don't mind including dab pages, pages with no links, etc.
I assume that all articles deleted as a result of an AfD discussion have "Articles for deletion" in the log reason
From quarry:query/78694 I see that in 2022, 109,583 mainspace pages were deleted in 2022
From quarry:query/78460 I see that in 2022, 37,297 pages were deleted with "redirect overwrite" in the log reason
From quarry:query/78460 I see that in 2022, 5380 pages were deleted with "Redirects for discussion" in the log reason
From quarry:query/78460 I see that in 2022, 13,635 pages were deleted with "Articles for Deletion" in the log reason
I assume that the number of mainspace redirects not accounted for in the above queries is negligible or too difficult to get (is this true?)

Therefore it appears that:

In 2022, 66,906 articles were deleted (109,583 - 37,297 - 5380), an average of 183 per day
In 2022, 13,635 articles were deleted via AfD, an average of 37 per day
In 2022, 20% of article deletions were the result of an AfD discussion.

Does this look right? Clayoquot (talk | contribs) 19:38, 26 August 2024 (UTC)

#3 and #4 are accurate; #5-7 aren't. And #3 isn't either if you'd count a move out of mainspace without leaving a redirect as deletion.

Query 78460 doesn't do quite what you think it does. It's not showing the total number of deletions whose logs mention A7, the total number that mention AFD, etc; it tries to assign each deletion to a single reason in a given order. So if, say, a page was deleted with comment "Wikipedia:Articles for deletion/Spacely Sprockets; also a WP:CSD#G11" it would be counted as G11 and not AFD.

And there's no way to find out if a page was a redirect or not when it was deleted, other than if the comment mentions an R-series criterion (which that query doesn't look for) or RFD, or if it's the automatic deletion during a page-move. You could conceivably look at the length of the most recent deleted revision, but there's lots of redirects with more bytes in them than lots of short articles. I'd have no confidence in any query of the public replicas that purported to accurately count the number of redirects deleted. But if you were trying for a SWAG, you can still do a lot better than this query - sum the automatic overwrites, plus the R-series speedies, plus the RFDs, plus some proportion of appropriate-looking G6s and G8s (G14s using its redirect clause aren't ever distinguishable from other G14s in my experience). —Cryptic 20:09, 26 August 2024 (UTC)

Excellent points, thanks. Re #6, I also just realized that when an article is deleted at AfD, its Talk page(s) and redirects to it are also deleted with "Articles for Deletion" in the log reason. The best way to count articles deleted via AfD is User:JPxG/Oracle as far as I can tell. According to that page, in 2022, the daily average number of AfD deletion discussions that resulted in “Delete” or “Speedy Delete” was 33. I'll work on refining this. Cheers, Clayoquot (talk | contribs) 23:50, 27 August 2024 (UTC)

Active admins

I'm trying to figure out how many of the admins listed at Wikipedia:Active admins are also making more than a thousand edits a month. See WT:RFA for why I'm interested in doing that. Clovermoss🍀 (talk) 22:28, 26 August 2024 (UTC)

56, give or take. That's not quite what you're asking; it's the number of current admins (including adminbots) with more than 1000 edits in the last 30 days. I didn't crossref the "active admins" page since they should be mostly the same, give or take recent desysops (Pppery would've just missed the list with 956 edits) or re-activations. Going back more than 30 days is much slower, roughly five or six minutes per month (including the first). —Cryptic 23:07, 26 August 2024 (UTC)

Okay, thanks. Clovermoss🍀 (talk) 23:08, 26 August 2024 (UTC)

By the way, 1000 edits a month is a lot. I used to hit this back when I was backlog crushing in 2021. Xtools. Nowadays I only watchlist, then spend the rest of my time on coding which doesn't really increase my edit count much, and I no longer hit 1000 edits a month, even though I edit every day. So be careful, such a high threshold may exclude some active admins such as myself who spend hours on wiki every day. –Novem Linguae (talk) 10:17, 27 August 2024 (UTC)

I agree. I'm rarely in the 1,000+ edits a month category myself. [3] If we have good admins that don't have crazy high edit counts once they get the bit, it stands to reason that we're missing out on some good potential admins due to editcountitis. The whole point of this query was inspired by my essay and my comments about it in the above thread. Clovermoss🍀 (talk) 10:25, 27 August 2024 (UTC)

Groovy. If it helps, I think the de facto edit counts to pass RFA nowadays are 8000 total (due to 0xDEADBEEF passing with this recently, and no one passing with anything lower since GoldenRing in 2017) and a couple hundred edits a month for the last 6 months or so (enough to show that you aren't inactive). –Novem Linguae (talk) 17:05, 27 August 2024 (UTC)

My argument is that a couple hundred edits a month... to show that you aren't inactive may be part of the problem. Clovermoss🍀 (talk) 17:08, 27 August 2024 (UTC)

@Clovermoss: If you have a candidate for adminship in mind I'd be happy to show them how to fix typos on a large scale. There are many tasks (that may be a bit behind the scenes and boring) that require making many edits. Polygnotus (talk) 17:24, 27 August 2024 (UTC)

Whereas I've averaged 1000+ edits per month almost every year since getting the bit, and I don't appear on Cryptic's list of 56. Been doing a lot of gardening this summer, I guess. :D Valereee (talk) 11:41, 28 August 2024 (UTC)

I'm on the list but there's only been 4 months since 2018 where I've hit the 1000+ threshold. Clovermoss🍀 (talk) 11:43, 28 August 2024 (UTC)

Valereee, your count from a similar query without the 1000-edit cutoff is 584. That's roughly correct looking at your contributions; it's a simple count of edits in the past 30 days. It won't have counted the two deleted edits you have from that period. —Cryptic 12:30, 28 August 2024 (UTC)

Oh, yes, I typically do have fewer edits this time of year. I was more thinking of averages -- that is, if we're looking for potential candidates who average (say) 1000 edits per month, we wouldn't capture even someone who averages more than 1000 but not in the past month, if you see what I mean. I tend to do more editing when it's dark 13 hours a day, so in January I'd almost definitely be captured. In a typical August, possibly not. Valereee (talk) 12:47, 28 August 2024 (UTC)

Well, by happy coincidence, I just ran this query. It shows admins (again, only users who currently have the bit) who have at least 1000 live edits from at least one of the past four calendar months, i.e. April-July. 100 qualified in at least one month; 60 of them, including at least three bots, averaged over 1000 for the entire period. —Cryptic 12:50, 28 August 2024 (UTC)

Oh, wow, that is fascinating, thank you! 3 our of 5 who made 1000 edits at least once in the past four months also averaged more than 1000 over that period! People who are in are in, I guess?

Could we run a similar for non-admins with, say, at least a year's experience, at least 10K edits, and 1000 edits at least once over the past four months, or would that be a headache/turn out a way too huge number? Valereee (talk) 13:04, 28 August 2024 (UTC)

It'll come up at quarry:query/85881 eventually. That's going to be a lot slower, since it has to look at all edits in that time period instead of just those made by the 854 current admins. —Cryptic 13:13, 28 August 2024 (UTC)

Oh, very cool, thank you so much! Valereee (talk) 13:20, 28 August 2024 (UTC)

section headers starting with a lowercase "w" followed by a uppercase letter or a number

I discovered something weird, see Wikipedia:Help_desk#Many_W's. Is it possible to run a query or write a regex search or whatever that meets these criteria:

Finds section headers (no matter how many levels deep) that start with a lowercase "w" immediately followed by a uppercase letter or a number. Polygnotus (talk) 15:05, 28 August 2024 (UTC)

The database replicas don't have article text, so the only place these would show up is if there are redirects to those sections.

Regex searches aren't indexed, so don't work well on their own; you have to pair them with something that is, like a category or template or outgoing link or normal search. But if you had that, I'd think insource:/== *w[A-Z0-9]/ would find just about all of these; it'd miss weird space characters, which should be uncommon anyway, and level-1 headers, where you'd get too many false positives with template parameters. Example search, paired with "meters". —Cryptic 15:23, 28 August 2024 (UTC)

Thank you! I think I have them all. I am not worried about level 1 headers because I think that every header that follows this pattern is a subheader of something like "Women's events" or "Women's results" or similar. Polygnotus (talk) 15:34, 28 August 2024 (UTC)

All categories with "LGBT" in their title

Would it be possible to get a list of all categories with LGBT (without a Q) in their title? They all need to be WP:C2D-renamed to change from LGBT to LGBTQ following Talk:LGBTQ#Requested move 14 August 2024, and a query which can be redone at will to get the currently-outstanding list would be helpful. Thanks, House Blaster (talk • he/they) 03:08, 31 August 2024 (UTC)

Try this. Let me know if it needs adjusting :) –Novem Linguae (talk) 03:22, 31 August 2024 (UTC)

As always, you are amazing, Novem! Exactly what I had in mind. Thank you so much :) House Blaster (talk • he/they) 03:30, 31 August 2024 (UTC)

@Novem Linguae: would it be possible to add a check to make sure the category is not a member of Category:All categories for discussion (and thus is not already nominated)? Thanks, House Blaster (talk • he/they) 03:14, 1 September 2024 (UTC)

Here you go. I've also made the check against 'LGBT' (but not 'LGBTQ') case-insensitive, which finds a handful more around Category:British lGBT entertainers and Category:Irish lGBT entertainers. —Cryptic 03:58, 1 September 2024 (UTC)

Thank you, Cryptic!! House Blaster (talk • he/they) 04:14, 1 September 2024 (UTC)

Typical page views

I would like a sentence in Wikipedia:Statistics#Page views that says something like "As of 2024, the median page views for an article is 6 per day, with n% of articles falling into the range of 3 to 50 page views per day" (where 50% ≤ n ≤ 90%, to show the middle range). I think a sample from 10,000 articles (e.g., User:BilledMammal/Average articles) or even just 1,000 would be sufficient.

I have found Wikipedia:Does Wikipedia traffic obey Zipf's law? but it's based on very old data. WhatamIdoing (talk) 18:41, 10 September 2024 (UTC)

Wikipedia:Request a query/Archive 3#Shortdesc query looks similar. WhatamIdoing (talk) 19:04, 10 September 2024 (UTC)

View counts aren't exposed in the public database replicas AFAIK, so this isn't possible other than by hitting the API once for each page. —Cryptic 19:24, 10 September 2024 (UTC)

You can try plugging quarry query 80241 or pagepile 60602 - they're the same sample of 10000 pages - into https://pageviews.wmcloud.org/massviews, but it only gave me data on about 3/4 of them and a ton of errors on the rest. —Cryptic 19:45, 10 September 2024 (UTC)

Thanks. I was able to get the pageviews by running it in two batches. I'm looking at the numbers now. WhatamIdoing (talk) 20:23, 10 September 2024 (UTC)

After excluding ~200 pages with 0 reported page views from 2023 (because they might have been created during 2024), I find: mean of 5,725 page views per year, median of 55 page views per year, mode of 1. 25% of articles get 3 page views per year(!) or less. 75% of articles get 570 page views per year — or less. Almost exactly 10% of articles average 10+ page views per day. WhatamIdoing (talk) 21:05, 10 September 2024 (UTC)

Cross Reference Request

Would someone be able to cross-reference the links in this list Green Bay Packers All-Time Roster with to verify that every target page in that list is also categorized in that category. The goal for me here is to make sure the links in the list are targeted to the right article. As an example, the list had Rex Smith, when it should have had Rex Smith (American football) as he correct target. Thus, this cross reference will identify any disambiguation that needs to occur on that page for links that aren't disambiguation pages (which I have already fixed). Thank you! « Gonzo fan2007 (talk) @ 22:14, 18 September 2024 (UTC)

quarry:query/86383. —Cryptic 01:11, 19 September 2024 (UTC)

Cryptic, thank you! The query appears to be pulling the links from {{Green Bay Packers}}, which is on the page. Any way to exclude the template in the query? « Gonzo fan2007 (talk) @ 17:53, 19 September 2024 (UTC)

Only if you don't mind if links that appear on both the template and the page itself are excluded, too. —Cryptic 00:26, 20 September 2024 (UTC)

I dont mind! Thank you! « Gonzo fan2007 (talk) @ 00:59, 20 September 2024 (UTC)

Same query, updated in-place. —Cryptic 01:12, 20 September 2024 (UTC)

Thank you Cryptic! « Gonzo fan2007 (talk) @ 17:43, 20 September 2024 (UTC)

submitted drafts

How do I check my submitted drafts — Preceding unsigned comment added by 79lives (talk • contribs) 12:07, 4 October 2024 (UTC)

Just currently-submitted ones, or ones that have been accepted, too?

If the first, you can see these by going to Special:Contributions/79lives, expanding "Search for contributions", picking "Draft" from the "Namespace" dropdown, and clicking the blue "Search" button. That gives you this.

If the second, Special:Log/create/79lives comes close. —Cryptic 13:09, 4 October 2024 (UTC)

how many transclusions of a list of templates

Yeah, I can repeatedly fill in the blank but I'd really rather not have to do that. But, I don't know anything about database queries; hence my request.

Given a list of templates, is it possible to fetch each template's transclusion count and return both the template name and the count for each one? The templates that I have in mind are a filtered subset of the templates listed at Category:Lang-x templates.

—Trappist the monk (talk) 16:14, 6 October 2024 (UTC)

Yes. Easiest way to filter the list is to make a user subpage with links to each one you want counted. Or quarry:query/86855 for every template in that category. —Cryptic 18:56, 6 October 2024 (UTC)

The every template in that category query is just fine; I should have said that the category is lightly filtered so those that I don't care about can be removed by hand.

Thank you very much.

—Trappist the monk (talk) 19:23, 6 October 2024 (UTC)

Office Actions across all wikis

Is there a way to run Q87329 on all the wiki-DBs without manually setting the database every time? Thanks, TrangaBellam (talk) 19:25, 21 October 2024 (UTC)

Ah, not possible, it seems. TrangaBellam (talk) 19:29, 21 October 2024 (UTC)

Not via Quarry, but I could do it via wikitech:PAWS. Excluding Meta, MediaWiki.org, and testwiki which aren't content projects so have lots of false positives, this gives the following results:

Extended content

commonswiki
1
b'20170618133428' b'18' b'06' b'2017' b'Freedom_of_Panorama_ZA' b'protect' b'Seddon (WMF)' b'a:4:{s:14:"4::description";s:112:"\xe2\x80\x8e[edit=sysop] (expires 13:34, 18 September 2017 (UTC))\xe2\x80\x8e[move=sysop] (expires 13:34, 18 September 2017 (UTC))";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:14:"20170918133428";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:14:"20170918133428";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:248247113;}'
enwiki
5
b'20241021032041' b'21' b'10' b'2024' b'Asian_News_International_vs._Wikimedia_Foundation' b'protect' b'WMFOffice' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:1252387069;}'
b'20241021030808' b'21' b'10' b'2024' b'Asian_News_International_vs._Wikimedia_Foundation' b'protect' b'WMFOffice' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:1252385692;}'
b'20221020124933' b'20' b'10' b'2022' b'List_of_prime_ministers_of_the_United_Kingdom_by_length_of_tenure' b'modify' b'Seddon (WMF)' b'a:4:{s:14:"4::description";s:124:"\xe2\x80\x8e[edit=autoconfirmed] (expires 12:49, 3 November 2022 (UTC))\xe2\x80\x8e[move=autoconfirmed] (expires 12:49, 3 November 2022 (UTC))";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:13:"autoconfirmed";s:6:"expiry";s:14:"20221103124933";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:13:"autoconfirmed";s:6:"expiry";s:14:"20221103124933";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:1117200886;}'
b'20170925233803' b'25' b'09' b'2017' b'Senford_High_School' b'unprotect' b'Jalexander-WMF' b'a:0:{}'
b'20150727021026' b'27' b'07' b'2015' b'Lois_Lee' b'protect' b'Philippe (WMF)' b'\xe2\x80\x8e[edit=sysop] (expires 02:10, 27 August 2015 (UTC))\xe2\x80\x8e[move=sysop] (expires 02:10, 27 August 2015 (UTC))\n'
foundationwiki
9
b'20190315192235' b'15' b'03' b'2019' b'Tax_Deductibility' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122763;}'
b'20190315192220' b'15' b'03' b'2019' b'Tax_Deductibility/ru' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122762;}'
b'20190315192208' b'15' b'03' b'2019' b'Tax_Deductibility/nl' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122761;}'
b'20190315192159' b'15' b'03' b'2019' b'Tax_Deductibility/ja' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122760;}'
b'20190315192150' b'15' b'03' b'2019' b'Tax_Deductibility/it' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122759;}'
b'20190315192142' b'15' b'03' b'2019' b'Tax_Deductibility/fr' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122758;}'
b'20190315192133' b'15' b'03' b'2019' b'Tax_Deductibility/es' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122757;}'
b'20190315192122' b'15' b'03' b'2019' b'Tax_Deductibility/de' b'unprotect' b'Pcoombe (WMF)' b'a:1:{s:17:"associated_rev_id";i:122756;}'
b'20171214175032' b'14' b'12' b'2017' b'Values' b'protect' b'Awjrichards (WMF)' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:112467;}'
frwiki
3
b'20240726050944' b'26' b'07' b'2024' b'Fran\xc3\xa7ois_Billot_de_Lochner' b'unprotect' b'WMFOffice' b'a:0:{}'
b'20240323002802' b'23' b'03' b'2024' b'Laurent_de_Gourcuff' b'protect' b'WMFOffice' b'a:3:{s:14:"4::description";s:57:"\xe2\x80\x8e[create=sysop] (expire le 21 mars 2026 \xc3\xa0 23:00 (UTC))";s:14:"5:bool:cascade";b:0;s:7:"details";a:1:{i:0;a:3:{s:4:"type";s:6:"create";s:5:"level";s:5:"sysop";s:6:"expiry";s:14:"20260321230000";}}}'
b'20240105181624' b'05' b'01' b'2024' b'Fran\xc3\xa7ois_Billot_de_Lochner' b'protect' b'WMFOffice' b'a:3:{s:14:"4::description";s:26:"\xe2\x80\x8e[create=sysop] (infini)";s:14:"5:bool:cascade";b:0;s:7:"details";a:1:{i:0;a:3:{s:4:"type";s:6:"create";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";}}}'
labswiki
2
b'20231209022707' b'09' b'12' b'2023' b'Country_protection_list' b'protect' b'Neil Shah-Quinn (WMF)' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:2134283;}'
b'20210826223419' b'26' b'08' b'2021' b'SRE/SRE_Clinic_Duty/Access_requests' b'protect' b'Neil P. Quinn-WMF' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:1923387;}'
outreachwiki
3
b'20200403154633' b'03' b'04' b'2020' b'Education' b'modify' b'SPatnaik (WMF)' b'a:4:{s:14:"4::description";s:72:"\xe2\x80\x8e[edit=autoconfirmed] (indefinite)\xe2\x80\x8e[move=autoconfirmed] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:13:"autoconfirmed";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:13:"autoconfirmed";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:215689;}'
b'20191010155211' b'10' b'10' b'2019' b'Education/Greenhouse/Online_Course_Badges' b'protect' b'SPatnaik (WMF)' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:201954;}'
b'20190122174727' b'22' b'01' b'2019' b'Education' b'protect' b'SPatnaik (WMF)' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:189042;}'
testcommonswiki
1
b'20190107183928' b'07' b'01' b'2019' b'Main_Page' b'protect' b'Jdforrester (WMF)' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:27;}'
wikimania2017wiki
1
b'20170613212618' b'13' b'06' b'2017' b'Registration' b'protect' b'EYoung (WMF)' b'a:4:{s:14:"4::description";s:56:"\xe2\x80\x8e[edit=sysop] (indefinite)\xe2\x80\x8e[move=sysop] (indefinite)";s:14:"5:bool:cascade";b:0;s:7:"details";a:2:{i:0;a:4:{s:4:"type";s:4:"edit";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}i:1;a:4:{s:4:"type";s:4:"move";s:5:"level";s:5:"sysop";s:6:"expiry";s:8:"infinity";s:7:"cascade";b:0;}}s:17:"associated_rev_id";i:31128;}'
zhwiki
1
b'20180227193027' b'27' b'02' b'2018' b'AddisWang' b'protect' b'WMFOffice' b'a:3:{s:14:"4::description";s:37:"\xe2\x80\x8e[create=autoconfirmed] (\xe6\x97\xa0\xe9\x99\x90\xe6\x9c\x9f)";s:14:"5:bool:cascade";b:0;s:7:"details";a:1:{i:0;a:3:{s:4:"type";s:6:"create";s:5:"level";s:13:"autoconfirmed";s:6:"expiry";s:8:"infinity";}}}'

The only new office action we didn't already know about found this way is the autoconfirmed protection of zh:AddisWang. * Pppery * _{it has begun...} 19:36, 21 October 2024 (UTC)

Thanks a lot! FYI, @Bri:: The ANI action indeed seems to be a very rare case. TrangaBellam (talk) 19:41, 21 October 2024 (UTC)

Uncategorized redirects

Looking for a quarry query that can evaluate how many main space redirects do not current have redirect categories added to them. Hey man im josh (talk) 15:12, 15 October 2024 (UTC)

It's likely to take a long time to run. Do you want a list or just a number? —Cryptic 17:15, 15 October 2024 (UTC)

I do understand it'd likely take a while, so for now just a number. If it's not much more work, I think the number by year could also be helpful. It's for use in a discussion regarding edit filter 1,298. Hey man im josh (talk) 17:18, 15 October 2024 (UTC)

There's 6265917 total (counts by year of first edit). Sample of 10000. —Cryptic 17:55, 15 October 2024 (UTC)

Awesome, thank you so much! Hey man im josh (talk) 17:56, 15 October 2024 (UTC)

More redirect queries

As discussed with Cryptic on his talk page, I'm listing some words/cats for the queries to filter with.
Query 2: Redirects with these words in title: Journal, journal, (Journal), (journal)
Query 3: Redirects to articles in these Categories: Academic publishing companies, Non-profit academic publishers, Open access publishers Nobody (talk) 06:46, 16 October 2024 (UTC)

Journal/journal, in those categories. (A version of the latter letting you sort on the category, but containing multiple rows for each redirect, one for each matched category the target's in.) —Cryptic 20:11, 16 October 2024 (UTC)

@Cryptic Would it be possible to search for redirects to articles in specific WikiProjects? Like WP:CH or WP:AJ for example. Nobody (talk) 12:42, 25 October 2024 (UTC)

If their talk pages are categorized or transclude a wikiproject template. —Cryptic 12:55, 25 October 2024 (UTC)

Could you make me a query for redirects like Chateau Chillion, that have no rcats and redirect to an article with {{WikiProject Switzerland}} on the talk page? Thanks Nobody (talk) 13:01, 25 October 2024 (UTC)

quarry:query/87427 has the first ten thousand. There's 23765 total (of 52629 mainspace redirects targeting such pages). —Cryptic 13:44, 25 October 2024 (UTC)

Age of dormant accounts

For the list at Wikipedia:List of Wikipedians by number of edits#1–1000, can you filter out still-active accounts (e.g., any edit during the last year?), and for the remaining ones, tell me:

how many are dormant, and
how many long the account was active for (e.g., if the person edited from 2010 to 2020, then say "10 years").

I want to write sentences that say things like "10% of our most-active editors are no longer editing" and "Among these former editors, most of them edited for about 10 years before quitting or being blocked". Ultimately, I'm hoping to use it as a way to ballpark a replacement rate for high-volume editors. This will be imperfect, but it might give me a very general idea. (Better suggestions are welcome.) WhatamIdoing (talk) 07:19, 14 October 2024 (UTC)

The only userpage linked from that list who hasn't edited in 2024 is Geo Swan. 6330 days between first and last edits, about 17 years four months. —Cryptic 14:42, 14 October 2024 (UTC)

The reason for that being that users who haven't edited in a while are normally unlinked. For some mysterious reason Geo Swan is still linked, though.

And Geo Swan is a bad example as they were banned (over a single incident that in my opinion did not warrant a ban) not chose to stop editing. * Pppery * _{it has begun...} 16:31, 14 October 2024 (UTC)

Well, that's inconvenient, though I suppose that it makes it easier to figure out which editors are inactive. 727 out of 1,000 accounts are still linked; therefore, 27% of editors who have made the most edits are inactive. (There are also 11 "placeholders" and a couple of blocked accounts, so ±2%.)

It feels like blocks and bans are a non-trivial way for us to lose editors, so I would be inclined to keep them in the list. WhatamIdoing (talk) 18:19, 14 October 2024 (UTC)

I discovered why those usernames are unlinked independently, after a couple iterations of a query that takes an hour and a half to complete. Très annoying. If you make a user subpage linking to just the unlinked users (and Geo Swan, too, I guess), I can rerun it against that. —Cryptic 22:17, 14 October 2024 (UTC)

Here's a permalink to the whole list, divided by activity, with links for all named accounts. This revision has just the inactive accounts. I'm not sure what the source's cutoff is, but I saw someone in the 'inactive' list who made an edit exactly 30 days ago, and several who have made edits in the last couple of months. WhatamIdoing (talk) 19:02, 23 October 2024 (UTC)

quarry:query/87412. —Cryptic 03:00, 25 October 2024 (UTC)

So about a dozen years.

Iff that's typical for the lifespan – and it might very well not be, in which case, it is almost certainly an underestimate – we may need to double that. I've previously estimated that our current retention rate gets us about enough folks each year to replace 4% of the people who have made 100K edits, or 25 year for full turnover. WhatamIdoing (talk) 21:39, 25 October 2024 (UTC)

List of all talk pages matching "Archives\s\/\s\d{1,3}"

Usually archive pages on Wikipedia are of the format "/Archive 1", "/Archive 2",... Often when talk pages are moved, the mover does not update the Archiving instructions for the bots. This causes the bot to send sections to archives titled "Archives/ 1", "Archives/ 2", breaking the archiving pages pattern as well as sequence. For example, the last archival before the move might be to "Archive 4". After move, newer sections go to "Archives/ 1". In order to fix them, I would need this query. Thanks! —CX Zoom[he/him] ^{(let's talk • {C•X})} 20:06, 2 November 2024 (UTC)

quarry:query/87612. The {1,3} is superfluous without anything following it; I didn't assume an implicit $ since an implicit ^ to go with it would prevent any matches. If you were trying to filter out titles like Talk:.30 carbine/Archives/2014/June, you'd need something like ($|\D) afterwards. —Cryptic 21:18, 2 November 2024 (UTC)

Thank you very much! —CX Zoom[he/him] ^{(let's talk • {C•X})} 10:12, 3 November 2024 (UTC)

List of articles likely to have one or no sources

While making this edit recently, it occurred to me that we ought to have a way of at least semi-automatically identifying and tagging articles with either a single or no sources. I'd like to be able to do an AWB run of likely such articles.

Given that there are many different ways to do sources, I'd like to start with a conservative query, which lists all articles that contain none of the following strings:

<ref
http://
Notes
cite
Reference
Sources
Citation
Bibliography
sfn

I don't know how to construct a RegEx query with a negative (the internet seems to have some ideas, but I struggle to convert this into Wikipedia's flavor), so I'd appreciate some help. Could anyone help me generate this list? Cheers, Sdkb ^talk 05:14, 14 November 2024 (UTC)

No access to article text. —Cryptic 06:19, 14 November 2024 (UTC)

this regex search is a start. It gives 10000 results then times out. * Pppery * _{it has begun...} 06:24, 14 November 2024 (UTC)

You'll want to at least make that case-insensitive, anchor "ref" and maybe "cite" to word boundaries, and match "https://" too. But still, WP:Request a query isn't WP:Request a search. —Cryptic 06:34, 14 November 2024 (UTC)

...holy crap, it is. It shouldn't be. —Cryptic 06:35, 14 November 2024 (UTC)

The underlying ElasticSearch cluster has a read-only replica on Toolforge, which can be queried. So I'd say this page is the right place for such requests. – SD0001 (talk) 07:41, 14 November 2024 (UTC)

If someone comes here looking for help with Elasticsearch's middle-end, they're going to be very, very disappointed. —Cryptic 08:13, 14 November 2024 (UTC)

Thanks, @Pppery! After expanding the query to

-insource:/([Rr]ef|http|[Nn]otes|[Cc]ite|[Ss]ources|[Cc]itation|[Bb]ibliography|sfn|list of|lists of|link|further reading|Wiktionary redirect)/ -intitle:list -deepcategory:"Set index articles"

it's starting to turn up mostly useful results. Cheers, Sdkb ^talk 07:17, 14 November 2024 (UTC)

You can get more results before it times out by adding more non-regex filters. For instance, adding -hastemplate:"Module:Citation/CS1" gives 15k results instead of just 2k. – SD0001 (talk) 07:39, 14 November 2024 (UTC)

Anyway, the sort of things this page can do to answer your original question are to give you lists of pages with zero, or zero or one, external links, or that don't transclude any of a set of templates, or both; and as a bonus filter out redirects (which I'm fairly sure search does whether you like it or not), disambigs, and - to some extent - list pages. —Cryptic 07:16, 14 November 2024 (UTC)

Maybe rename this page to WP:Request a SQL query. * Pppery * _{it has begun...} 20:13, 14 November 2024 (UTC)

Or we could ask people to read past the page title to the first two sentences. —Cryptic 02:55, 15 November 2024 (UTC)

Syntax error due to using a reserved word as a table or column name in MySQL

https://quarry.wmcloud.org/query/87911

https://stackoverflow.com/questions/23446377/syntax-error-due-to-using-a-reserved-word-as-a-table-or-column-name-in-mysql

It isn't handling the `user` table right as "user" is an SQL reserved word, I think.

The syntax highlighter was showing "user" in red, so I surrounded it with backticks `user`, then it was showing in light blue.

I think it needs to be highlighted in white to work correctly. But how? wbm1058 (talk) 18:47, 14 November 2024 (UTC)

Unrelated to the reserved word. `WHERE IS NULL(u.user_name)` should be `WHERE u.user_name IS NULL`. But see prior noise at User talk:Primefac/Archive 32#U2 deletions if you want to continue this. * Pppery * _{it has begun...} 20:12, 14 November 2024 (UTC)

https://www.w3schools.com/sql/sql_isnull.asp indicates that my syntax should be valid. Two alternative ways to do the same thing? Regarding the "prior noise", I'm a more competent administrator who's checking page histories, and leaving redirects within user space alone. My current focus is on cross-namespace redirects from user pages of nonexistent users to outside of userspace. My recent deletion log will give you an idea; I'm trying to make a more specific query to reduce the noise level in the query results I've been working from. – wbm1058 (talk) 20:53, 14 November 2024 (UTC)

Wikimedia uses MySQL (actually MariaDB which uses MYSQL-ish syntax), not SQL server where your link says ISNULL (not IS NULL which the query uses) is valid. * Pppery * _{it has begun...} 21:06, 14 November 2024 (UTC)

MariaDB supports ISNULL(), and it works the way Wbm1058 was trying to use it (modulo the misplaced space). SQL Server's ISNULL() is a synonym of COALESCE() instead. x IS NULL is generally safer precisely because of that incompatibility. —Cryptic 21:29, 14 November 2024 (UTC)

I tried just changing the syntax of the "IS NULL" statement as suggested. It was cooking on that for a while, and then:

"Error

This web service cannot be reached. Please contact a maintainer of this project.

Maintainers can find troubleshooting instructions from our documentation on Wikitech."

Hopefully my query didn't just crash the server. – wbm1058 (talk) 21:55, 14 November 2024 (UTC)

It just ran to completion, so simply changing the "IS NULL" statement fixed the syntax error. Now on to figure out the results, and tweak the query to do what I really want it to do. Thanks for your help. wbm1058 (talk) 22:09, 14 November 2024 (UTC)

FYI, I'm now feeling the joy. User:Wbm1058/Userpages of nonexistent users is my report of 400 pages which I think may all be safely speedy-deleted under U2: Userpage or subpage of a nonexistent user. This report was culled from a report of 1960 pages, by INTERSECT with the user table SELECT. This is indicative of the poor page-move interface design, which leads editors who think they're publishing user drafts to keep pages in userspace when they really wanted to move to mainspace, because they neglected the namespace dropdown in the move-page user interface. – wbm1058 (talk) 14:11, 15 November 2024 (UTC)

Dusty articles within all subcategories of a parent category

Is this possible? I'd like to get a list like Special:AncientPages but for anything within any subcategory of Category:Food and drink. It would make quite a nice little To Do list for times I feel like doing some research and writing but don't have a particular bee in my bonnet that very minute. Thanks for any help! Valereee (talk) 15:16, 17 November 2024 (UTC)

What is "dusty"? Neither Special:AncientPages nor Wikipedia talk:Special:AncientPages say what it does. Is it a sort by timestamp of last edit?

In direct subcategories only, include the handful of pages directly in the category, or the whole tree? If the last, to what depth? Examples: Category:Food and drink→Category:Dairy→Category:Dairy industry→Mark Ezell is depth 2, and Food and drink→Category:Dairy→Category:Dairy industry→Category:Dairy farming→Goatherd is depth 3; neither the page itself nor the root category count. —Cryptic 16:51, 17 November 2024 (UTC)

Yes, it's a list of articles by date of most recent edit.

Hm, on the second question. Ideally I'd end up with is a list of, say, food items that hadn't been edited in ten years. Or chefs, or restaurants, or food animals or whatever. Maybe I need to choose a more specific subcategory? Valereee (talk) 17:24, 17 November 2024 (UTC)

Well, ok,

Feed a cold, starve a fever is in Category:Food and drink, but, strictly speaking, isn't in "any subcategory of Category:Food and drink". Should it be included in the list?
Recipe is in Category:Cuisine, which is a subcategory of Category:Food and drink, so it should be. (It's also directly in Category:Food and drink, but never mind that.)
Dulce de leche isn't in Category:Food and drink or any of its immediate subcategories, but it is in Category:Confectionery, a subcategory of Category:Cuisine; so the article's in a sub-subcategory of Category:Food and drink. Include or not?
Ice milk isn't in Food or drink, its immediate subcategories, or (I think) any of their subcategories, but it's in Category:Ice cream, a subcat of Category:Confectionery, so it's at least in a sub-sub-subcategory of Category:Food and drink. Same question.

The reason I need a maximum depth is because - like almost all reasonably broad categories - Category:Food and drink eventually includes a significant portion of all categories. Depth 10, for example, has 122639 different categories in the tree, out of 2.4 million total categories (including maintenance categories, category redirects, and so on), and you really quickly start getting unrelated pages like Category:Food and drink → Category:Food and drink by country → Category:Agriculture by country → Category:Agriculture in Europe by country → Category:Agriculture in Romania → Category:Forestry in Romania → Category:Romanian woodworkers → Constantin Brâncuși.

Or, if you like, you can give me a list of categories to pull from. Even if it's a large list, or something like "Anything in any direct subcategory of Category:Food and drink, Category:Cuisine, Category:Chefs, Category:Poultry, [20 or 30 other cats]". —Cryptic 18:33, 17 November 2024 (UTC)

Oh, and do you want non-mainspace pages in the list or not? What about redirects? —Cryptic 18:38, 17 November 2024 (UTC)

lol...clearly in over my head here. :D Thank you for your patience.

So, no to feed a cold, starve a fever. Yes to recipe, dulce de leche and ice milk.

I think maybe start with something that's likely to contain fewer extraneous things. Category:Chefs in a way that will allow me to see, for instance, the articles that are in Category:Chefs by nationality > Category:Women chefs by nationality > Category:British women chefs > Category:Women chefs from Northern Ireland that haven't been edited in the last ten years. Valereee (talk) 18:50, 17 November 2024 (UTC)

Oh, no non-mainspace pages, no redirects. Valereee (talk) 19:13, 17 November 2024 (UTC)

None quite that old in either tree. quarry:query/87975 for Category:Food and drink depth 3 (oldest is Land reform in the Austrian Empire, 2015-11-16 18:36:35 - see what I meant about unrelated pages?), quarry:query/87976 for Category:Chefs depth 4 (oldest is Richard Ekkebus, 2019-12-16T04:47:03). —Cryptic 19:19, 17 November 2024 (UTC)

Well, thank you for your work, and sorry to waste your time! Valereee (talk) 19:35, 17 November 2024 (UTC)

Not wasted at all. Not your fault the category system is terrible for datamining.

There might be some value in finding the latest revision that wasn't marked minor, and maybe excluding ones made by bots too, but that's going to be harder and a lot slower. Would definitely need to cut the set of articles to look at to something on the order of a couple thousand before looking at the edits, rather than the tens of thousands in that first tree. —Cryptic 20:14, 17 November 2024 (UTC)

Thanks. And I've actually already found an article that needs attention from your 87976 query, so win!

The point for me here is looking for categories that have many articles that haven't been updated since before sourcing started modernizing. It's a bit tricky because the articles that were created first -- probably in any category -- are also likely the articles that get update often, have multiple watchers, etc. So it's possible there just aren't huge numbers of food articles that need this kind of attention. Valereee (talk) 21:18, 17 November 2024 (UTC)

Number of articles that are actually articles

There are 7,015,007, but AIUI that includes disambig pages, stand-alone lists, and outlines, and maybe even portals (i.e., all content namespaces, not just the mainspace) but excludes redirects. Is there a way to get a count of the number of just plain old ordinary articles, excluding the other types? (A percentage from a sample set is good enough; I'd like to be able to write a sentence like "Of the 6.9 million articles, 6.2 million are regular articles, 0.45 million are lists, and 0.2 million are disambig pages.") WhatamIdoing (talk) 22:46, 17 November 2024 (UTC)

@WhatamIdoing: according to Category:All disambiguation pages, there are 362,957 of those. BD2412 T 23:29, 17 November 2024 (UTC)

Category:WikiProject Outlines articles suggests that there are about a thousand of those, which will not have a material effect on the numbers.

Wikipedia:WikiProject Lists/Assessment#Statistics says they've tagged 131K pages. There are about 123,700 pages with "List of" or "Lists of" at the start of the title. WhatamIdoing (talk) 00:37, 18 November 2024 (UTC)

There is no clear definition of what a "regular article" is. Also many pages are not correctly marked and categorized. Don't for WP:INDEXES which look like ordinary articles, or might be, depending. -- GreenC 00:43, 18 November 2024 (UTC)

{{NUMBEROFARTICLES}} seems to be mainspace non-redirect pages. I'd thought it used other heuristics, too; I remember needing at least one link, and less confidently requiring a period? But plainly doesn't anymore; I'm getting 6912240 for ns0 !page_is_redirect on the replicas now.

There's only 362201 non-redirects in Category:All disambiguation pages and mainspace. Most of the difference are in other namespaces, probably legitimately, though I'm surprised to see 208 in user:, 44 total in various talk namespaces, 9 mainspace redirects, and a single redirect in draftspace.

114253 mainspace non-redirects in Category:All set index articles, though 64 of those are in the disambig cat as well.

Lists are less certain; there's no Category:All list pages. I could try to count pages that are in any category starting with "Lists " or ending with " lists", but - not having done that before - don't have any idea how many it would miss, and how many it would miscount. Ditto with pages starting with "List of " or "Lists of " (which is easy - 120653, not counting any redirs or pages in the dabs or set index cats). —Cryptic 01:00, 18 November 2024 (UTC)

Oh, and 11193167 redirects (so 18105407 total mainspace pages), if you care. —Cryptic 01:03, 18 November 2024 (UTC)

So 6,912,230 non-redirect pages, of which 362,201 are dab pages and 120,653 are Lists (per title), and the rest (e.g., Outlines, Index pages) is immaterial. A good SIA looks like an article and an underdeveloped one looks like a dab page, which takes us back to GreenC's point about it not being a clear-cut question.

All of this suggests that if you count SIAs as 'articles', then there are 6.429 million articles (93%) and if you count them as lists/dabs, then there are 6.315 million articles (91%).

Thanks, all. WhatamIdoing (talk) 01:15, 18 November 2024 (UTC)

Median account age for EXTCON

Hello again, generous satisfiers of curiosity:

Today's question is how old the typical currently active WP:EXTCONFIRMED account is. The requirements are:

is currently active (perhaps made at least one edit during the last 30 days? Any plausible definition of active works for me, so pick whatever is easiest and cheapest to run)
meets EXTCON (all of which will have the EXTCON permission)

I don't care whether it's date of first edit vs registration date. I also don't care whether it's all ~73K of them or if it's a sample of 5K–10K. I am looking for an answer to the nearest year ("Most active EXTCON editors started editing before 2014" or "If you see someone editing an article under EXTCON, they probably started editing more than 10 years ago").

Thank you, WhatamIdoing (talk) 17:14, 19 November 2024 (UTC)

Hmm. user_touched has been scrubbed because it is private data. So I guess LEFT JOIN recentchanges to see who is active? This should only get us users who have made an edit in the last 30 days. Then run MEDIAN() on it. Let's see if quarry:query/88037 finishes. If the count is 72,000ish, then I also need to add a WHERE to filter out the editors who aren't in recentchanges. –Novem Linguae (talk) 18:33, 19 November 2024 (UTC)

That's going to get you not just every user with the user right - the whole point of a left join is that you get a result whether there's a row in the joined table or not - but a row in your resultset for every row they have in recentchanges. And you're leaving out admins, who have extended confirmed implicitly. Plus, even if it worked, it would be a dump of ~25k values.

Mariadb has a MEDIAN() window function, but I can't get it to work on user_registration no matter how I try to preprocess it first - it gives me "Numeric datatype is required for percentile_cont function" when I call it directly on the column, which is reasonable, but always 100000000 if I cast it to any kind of numeric value, which isn't. (Anyone know what I'm doing wrong? Sample query. And I've never really grokked window funcs or how to get them to behave like normal, sane, grouped-by aggregate funcs anyway.) But doing it longhand works just fine. quarry:query/88039: 28 May 2013. —Cryptic 19:37, 19 November 2024 (UTC)

user_registration is ASCII-encoded binary rather than a binary number which is why you're getting nonsense when casting it and trying to do operations on it. Uhai (talk) 21:28, 19 November 2024 (UTC)

Casting it seems to get me a numeric, and doing normal arithmetic on it (user_registration + 1, division, and so on) coerces it to a numeric; it doesn't get me nonsense until I try to pass it through MEDIAN(). And UNIX_TIMESTAMP() in particular is documented to return an unsigned int. ~~Current theory is that MEDIAN() can't deal with large numbers (see resultset 4; dividing by numbers smaller than ten gets me 100 million again), which is boggling.~~ No, a cast or operation on the result of MEDIAN() is what fixes it. Still boggling. Cleaner query. Thanks for the prod. —Cryptic 21:55, 19 November 2024 (UTC)

Also, comparing the results reminded me that user_registration is NULL for some users who registered before mid-2005ish, which I hadn't corrected for. 2013-06-15 19:42:14, though I doubt the two weeks' inaccuracy is going to matter much to WAID. —Cryptic 22:30, 19 November 2024 (UTC)

TIL CAST seems to convert to the proper numeric representation if the binary string contains only numeric ASCII characters. Glad you were able to get it working though. Uhai (talk) 22:34, 19 November 2024 (UTC)

List of Revision IDs by edit summaries

Can someone write a SQL query that isolates all edits made with the edit summary Disambiguating links to [[XYZ (disambiguation)]] (intentional link to DAB) using [[User:Qwertyytrewqqwerty/DisamAssist|DisamAssist]]. where XYZ is any combination of letters, numbers or symbols. There is a bug in the script that causes edits with this summary to target to a wrong link, see User talk:Qwertyytrewqqwerty/DisamAssist#Bug Report: Double "(disambiguation)" links created. Thanks! —CX Zoom[he/him] ^{(let's talk • {C•X})} 14:24, 10 December 2024 (UTC)

quarry:query/88643. —Cryptic 17:43, 10 December 2024 (UTC)

Thank you very much. It helped me in fixing the pages that had wrong links due to the script bug. —CX Zoom[he/him] ^{(let's talk • {C•X})} 19:50, 10 December 2024 (UTC)

Draftifications by month

Hi everyone. Cryptic kindly created this query which shows how many draftifications took place between 2021-07 and 2022-08. Could someone please fork it to show dates from 2016 to 2024? If it's easier, I'm fine with seeing the number of draftifications by year instead of by month. Many thanks and best wishes, Clayoquot (talk | contribs) 03:38, 14 December 2024 (UTC)

I've rerun the query in-place. —Cryptic 14:19, 14 December 2024 (UTC)

Beautiful, thank you so much Cryptic! Clayoquot (talk | contribs) 16:29, 14 December 2024 (UTC)

List of all file redirects that are in use in mainspace

I wrote a query that lists all file redirects, at quarry:query/88966. Can this query be expanded to only list file redirects that are used in mainspace somewhere? –Novem Linguae (talk) 22:26, 19 December 2024 (UTC)

quarry:query/88967. —Cryptic 22:56, 19 December 2024 (UTC)

Update to NPP reports

Is it possible to add a link to the # column at Wikipedia:New_pages_patrol/Reports#Unreviewed_new_redirects_by_creator_(top_10) with an xtools redirs created link. It can target xtools:pages/en.wikipedia.org/USERNAME/0/onlyredirects

Done and done. —Cryptic 18:56, 20 December 2024 (UTC)

Thanks a lot ~/Bunnypranav:<ping> 04:06, 21 December 2024 (UTC)

Measuring the number of source links to each domain for a given article/set of articles

Command denied

I keep getting the error, "execute command denied to user 's52788'@'%' for routine 'enwiki_p.count'". I was using the page database, but even after I modified my query to only use the externallinks database (meaning I need to input a numerical page ID instead of using the title), I'm still getting the denial. What am I doing wrong here? Am I just not allowed to aggregate? Here's my query, simplified as much as possible and still not working:

SELECT count (el_to_domain_index)
FROM externallinks
WHERE el_from = 37198628
GROUP BY el_to_domain_index;

Safrolic (talk) 23:14, 21 December 2024 (UTC)

Remove the space between count and the open paren. —Cryptic 23:21, 21 December 2024 (UTC)

(Or set sql_mode = 'IGNORE_SPACE'; first. —Cryptic 23:24, 21 December 2024 (UTC))

Wow. Thank you. Safrolic (talk) 23:29, 21 December 2024 (UTC)

Lag, no results returned

Now I'm trying to get counts for all the external links from all the pages in a category. I want to do this for each separate page, and get lists of all the actual URLs, but y'know, baby steps. I used this query: https://quarry.wmcloud.org/query/89031

USE enwiki_p; SELECT el_to_domain_index, count(el_to_domain_index) FROM externallinks JOIN categorylinks ON cl_from = el_from WHERE cl_to = 11696843 GROUP BY el_to_domain_index ORDER BY count(el_to_domain_index) DESC;

I'm not getting any results and it takes ages to not get them. What am I doing wrong now? Also, how do I include pages in any subcategories, or does this include them automatically? Safrolic (talk) 00:57, 22 December 2024 (UTC)

I figured out that I need to use page despite the slowness it'll cause, because cl_to uses a name instead of an ID. So here is my new query, now also running on simplewiki for easier testing. https://quarry.wmcloud.org/query/89032

USE simplewiki_p
SELECT page_title, 
       el_to_domain_index,
       count(el_to_domain_index)
FROM externallinks
JOIN categorylinks ON cl_from = el_from
JOIN page on cl_from = page_id
WHERE cl_to = Canada
GROUP BY page_title, el_to_domain_index;

This query though has a syntax error on line 2.

I also think I might be in the wrong place to ask for step-by-step help like this. If there's a better place for me to go, I'd appreciate the direction. Safrolic (talk) 02:18, 22 December 2024 (UTC)

You don't need the USE statement on Quarry since you have to select a database there separately (since most are on different servers now); but if you keep it, you need to terminate it with a semicolon.

Next error you'd get is that you need to quote 'Canada'. At least that one has a useful error message ("Unknown column 'Canada' in 'where clause'").

The reason your first query took forever is because SELECT * FROM categorylinks WHERE cl_to = 11696843; does a full table scan - it tries to coerce each row's cl_to (a string value) into a number, and then does a numeric comparison. There's no correct way to use the index on cl_to since many different strings compare equal to that number, in particular ones starting with whitespace. SELECT * FROM categorylinks WHERE cl_to = '11696843'; on the other hand finishes instantly with no results (since Category:11696843 has no members). Categories are only loosely tied to the page at their title anyway.

You won't get members of subcategories like that - you have to go far out of your way to do so, similar to quarry:query/87975. You would get the direct subcategories like simple:Category:Canada stubs themselves, if any happened to have any external links. Distinguish them by selecting page_namespace too, if you're not already filtering by it. —Cryptic 02:56, 22 December 2024 (UTC)

It sounds like I'm better off doing a multipart kludge- getting all the relevant page titles with Massviews or Petscan, running a single query to turn them into IDs, then using those IDs as el_froms so I only need the externallinks database. Thank you for your help! Safrolic (talk) 05:59, 22 December 2024 (UTC)

Orphaned editnotices

When a page is moved, its editnotice is not moved with it. There is a post-move warning for it, but users would need to move it separately. That too can only be done by template editors, page movers and admins. I believe that there are plenty of editnotices that have become orphaned from their target page. I need a query to list such pages. If there is already a regularly updated database, that will work too. Thanks! —CX Zoom[he/him] ^{(let's talk • {C•X})} 07:53, 25 December 2024 (UTC)

Here's mainspace only to get you started: quarry:query/89138. You or someone else can fork and improve this if you need additional namespaces. Making this a database report somewhere using {{Database report}} might be a good idea. Hope this helps. –Novem Linguae (talk) 08:42, 25 December 2024 (UTC)

I suspect it's much worse than that. It's certainly more complex.

There's plenty of mainspace titles with colons in them, and it's conceivable that some of those have orphaned editnotices; there's really no way around parsing for the namespace name, and that's going to be ugly and complex, and I haven't tried it yet. (It being Christmas morning and all. Maybe tomorrow.) But I wouldn't estimate that to result in more than a handful of other hits.

Much more likely is the case that CX Zoom mentions directly: a page is moved but the editnotice isn't, leaving it attached to the remnant redirect. There's going to be false positives looking for those whether we do it the "correct" way and look in the move log (since there might be an editnotice intentionally attached to a page that had another page moved from it in the past), or whether we include editnotices attached to pages that are currently redirects. The latter's easier, and especially easier to combine with the query looking for pages that don't exist; I've done it at quarry:query/89148. That'll also miss editnotices that unintentionally weren't moved with their page, where the resulting redirect was turned into a different page, though. —Cryptic 15:30, 25 December 2024 (UTC)

Thank you very much, both of you... —CX Zoom[he/him] ^{(let's talk • {C•X})} 16:45, 25 December 2024 (UTC)

I've updated quarry:query/89148 in-place with a version that catches mainspace pages with colons. Template:Editnotices/Page/Index of underwater diving: N–Z is the only new hit. —Cryptic 17:00, 27 December 2024 (UTC)

Thanks! —CX Zoom[he/him] ^{(let's talk • {C•X})} 17:39, 27 December 2024 (UTC)

quarry:query/89198 has a version for all namespaces. And oh wow is it ever fugly. —Cryptic 17:44, 27 December 2024 (UTC)

Longest ref names

In my travels I have come across some very long ref names in ref tags, sometimes automatically generated by incorporating the title of the work that is being referenced. Occasionally I will shorten excessively long ref names just to improve readability of the wikitext in that section. This has me curious as to whether it is possible to generate a list of the longest ref names being used in articles, as there are probably some likely targets for this kind of cleanup. Is it possible to either generate a list of the longest ref names in order of length, or barring that, a list of ref names that are more than, say, 50, or perhaps 75, characters? BD2412 T 02:49, 29 December 2024 (UTC)

References aren't in the database except in page source, which isn't available in the replicas. You can find some with search, like insource:ref insource:/\< *ref *name *= *[^ <>][^<>]{76}/. That's going to miss a bunch of them, most obviously any refs crazy enough to include angle brackets in their names (though those mostly seem to be errors anyway) or ref syntax using non-standard spaces, but ElasticSearch's gratuitously awful regex engine can't do a whole lot better. Also won't find ref names populated through templates - I understand some infoboxes do this. —Cryptic 05:26, 29 December 2024 (UTC)

This is definitely plenty to start with. It is crazy that there are so many lengthy ref tags. The Wikipedia article was the most references has under 1000 of them, and if every ref name was made of an arbitrary combination of letters and numbers, they could all be handled with two character ref names. BD2412 T 18:44, 29 December 2024 (UTC)

Australia Project

I am interested to know how the Australian project is progressing. Number of

articles created
articles deleted
edits by editors with/without Australia user boxes.

Wakelamp d[@-@]b (talk) 12:42, 2 January 2025 (UTC)

Which specific user boxes? What time frame? Articles those users have deleted, or articles those users created that anyone deleted? Counting edits by editors without specific user boxes is Right Out; it's going to be well over a billion, would take days to count if the query didn't die (it would), and would be useless for any purpose. —Cryptic 16:46, 2 January 2025 (UTC)

@Wakelamp, would a list of the top editors of tagged articles be useful, and then you could compare the membership list by hand? See https://quarry.wmcloud.org/query/78918 for WPMED's list. WhatamIdoing (talk) 05:53, 5 January 2025 (UTC)

Number of editors per year

I bring https://quarry.wmcloud.org/query/89411 back to haunt you. I'm trying to assemble a table of registered editors per year. I have figured out how to modify the query to pick a different year. But what I want now is the number of registered editors in each year who made 10+ edits during that year (so, 10 edits in 2024 counts, but 5 edits in 2023 plus another 5 edits in 2024 does not), 100+ edits, and 1,000+ edits.

What do you think? ("The query will die" is hopefully not the answer.) WhatamIdoing (talk) 05:52, 5 January 2025 (UTC)

I can't think of a way to do this that doesn't look at every edit in the time range. I think it's likely the query will die. But then, it managed to complete for one month (January 2024), so maybe quarry:query/89557 for all of 2024 will eventually too. No counts of currently-deleted edits this time. —Cryptic 12:04, 5 January 2025 (UTC)

quarry:query/89569 includes deleted edits. —Cryptic 23:38, 5 January 2025 (UTC)

Maybe this is something that would have to be done by the WMF's Analytics team. I can get "one edit this year" from the original query that you wrote for me, and I'm currently slowly walking it back. It takes ~40 minutes to run, plus several hours for me to remember to check it.

@Jdforrester (WMF) may be interested in knowing that the peak for number of registered editors who made 1+ edit appears to be 2014–2015, aka when the visual editor became available again. WhatamIdoing (talk) 03:34, 6 January 2025 (UTC)

The 2024 query has completed. The one including deleted edits took an hour to run.

Is the right choice to fork it and run each year separately, or can it be expanded to do all/several years at once? WhatamIdoing (talk) 04:57, 6 January 2025 (UTC)

Better to fork it. The time for deleted edits isn't going to change much - it has to do a full table scan, since there isn't an appropriate index - but it's still the live edits that take the bulk of the time, and that is improved by narrowing the timespan looked at. —Cryptic 05:13, 6 January 2025 (UTC)

Okay. I forked it to quarry:query/89581, changed the years from 2024 to 2023, and set it to run again. If this works, then I can repeat that step a dozen times.

BTW, the earlier query got a slightly smaller total number of editors for 2024. WhatamIdoing (talk) 05:18, 6 January 2025 (UTC)

That's expected, regardless of which earlier query you mean. Query 89557 will have fewer than 89569 because there's plenty of users who have deleted edits in 2024 but no currently-live ones. One based on 89411 will have very slightly fewer because the views of the revision and archive tables it's looking at are slightly more heavily redacted than the ones 89557/89569 use. —Cryptic 05:35, 6 January 2025 (UTC)

89411 tells me there were 812,635 editors in 2023.

89581 (forked from your new 89569) tells me there were 11 fewer editors in 2023.

But the same scripts for 2024 vary in the opposite direction. The second script finds 29 more editors in 2024. Mostly the second script seems to be finding a small handful more editors (2 to 50) in each year. WhatamIdoing (talk) 02:46, 7 January 2025 (UTC)

I don't see a run in 89411's history for 2023. quarry:query/80211 shows that figure, but it was run almost a year ago. I've just rerun it, and it's giving me 812621, which is both more consistent with the 812624 from the new query and offset in the right direction. Or less wrong direction, anyway.

I'm reasonably confident that the reason we're getting fewer numbers from the same queries now compared to a year ago is because there's been more revdeletions and suppressions in the meantime; neither query can see such edits. It doesn't surprise me a bit to find out there's been revdeletions of 2023 edits made after early February 2024. I'll run a comparison of users that would be seen by the two queries so we can have a better idea why they show up in the second and not the first; it's going to take a while though. —Cryptic 04:25, 7 January 2025 (UTC)

OK, the three users that show up in the second query for 2023 but not the earlier one are Hhhj24 (talk · contribs · deleted contribs · page moves · block user · block log), IFAG dreifive (talk · contribs · deleted contribs · page moves · block user · block log), and Christian Granbacher (talk · contribs · deleted contribs · page moves · block user · block log). The first two have one live edit each in 2023; the third has two deleted edits in 2023. All three have user_editcount = 0 despite those edits, as can be seen in their contributions links. The way the earlier query works is it first fetches all users with user_editcount at least 1, then checks each of those users to see whether they have any live or deleted edits in the requested timeframe. So it doesn't ever check for edits by those three users because of the bad data in user_editcount. —Cryptic 05:22, 7 January 2025 (UTC)

Thank you. I'm going to file this under "deletions happen" and not worry about it. WhatamIdoing (talk) 03:25, 8 January 2025 (UTC)

Talk pages

Hey, I'm looking for help with creating two SQL queries

Find all talk pages (excluding archive talk pages) that do not use {{WikiProject banner shell}}.
Find all articles with no talk page (red link).

Gonnym (talk) 13:11, 15 January 2025 (UTC)

There are at least hundreds of thousands, likely millions, of each. quarry:query/89907 and 89908 have the first 10k. —Cryptic 14:18, 15 January 2025 (UTC)

Thanks! The first query is not producing correct results. The second result on the list is Talk:Destroy-Oh-Boy! which does use the template. Gonnym (talk) 16:50, 15 January 2025 (UTC)

No, the second result is Talk:!!Destroy-Oh-Boy!!. The page it redirects to transcludes the template. —Cryptic 17:40, 15 January 2025 (UTC)

(Which, of course, isn't that helpful an answer, so I've updated the query in-place to exclude talk pages that are redirects. —Cryptic 17:43, 15 January 2025 (UTC))

Serial commas in page titles

I posted the following request at WP:BOTR and was advised to come here.

Extended content

Hello, I'm not sure that this request can be completed automatically; please accept my apology if it can't. I just want some lists, without edits to anything except the page where you put the lists, so it's not a CONTEXTBOT issue: just a "good use of time" issue. Could you compile some lists of pages in which serial commas are present or are omitted? I just discovered List of cities, towns and villages in Cyprus and created List of cities, towns, and villages in Cyprus as a redirect to support serial commas. Ideally, whenever a page could have a serial comma in the title, we'd have a redirect for the form not used by the current title, but I assume this isn't always the case.

First off, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that use a serial comma. I think the criteria might be:

[one or more words]
comma
[one or more words]
comma
["and" or "or"]
[one or more words]

I'm unsure whether they're rigid enough, or whether they might return a lot of false positives.

Secondly, I'd like a list of all pages whose titles are identical to the first list, except lacking a serial comma. Redirects would be acceptable here, since if I'm creating serial-comma redirects, it helps to know if it already exists.

Thirdly, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that could use a serial comma but don't. I think the criteria would be:

[Page is not on first or second list]
[one or more words]
comma
one or more words]
["and" or "or", but no comma immediately beforehand]
[one or more words]

Once the list is complete, the bot checks each page with the following process: "if I inserted a comma immediately before 'and' or 'or', would it appear on the first list?" If the answer is "no", the bot removes it from the list.

Fourthly, I'd like a list of all pages whose titles are identical to the third list, except they have a serial comma. Again, redirects are acceptable.

Is this a reasonable request? Please let me know if it's not, so I don't waste your time.

After my request, User:Qwerfjkl suggested that I come here and offered some comments: intitle:/[A-Za-z ]+, [A-Za-z ]+, (and|or) [A-Za-z ]+/ would work for the first request and intitle:/[A-Za-z ]+, [A-Za-z ]+ (and|or) [A-Za-z ]+/ would work for the second.
The latter two lists are trickier. To this, I replied Is there a way to download a list of results from a particular search? As far as I know, the only way to get a list of results is to copy/paste the whole thing somewhere and delete everything that's not a page title. (With 11,544 results for the first search, this isn't something I want to do manually.) Also, the first search includes redirects, e.g. Orders, decorations, and medals of the United Nations is result #1.

Please ping me if you reply; I've never heard of this page before, and I might not remember to come back if you don't ping me. Thanks for your help. Nyttend (talk) 03:22, 20 January 2025 (UTC)

@Nyttend: Doing the whole thing would be very awkward; we don't have access to create temporary tables on the replicas, so it would have to all be in one query, and I haven't been able to cajole the query pessimizer into doing it efficiently. What I can do quite easily is give you lists of all mainspace titles that match each of those regexes, along with whether each is a redirect, and what the titles would be if the comma is added/removed (as appropriate). quarry:query/90019 and 90020. Those should be sufficient to construct your four lists, though perhaps not trivially. I can do that too if you're unable, though not tonight and maybe not tomorrow either; let me know. —Cryptic 04:32, 20 January 2025 (UTC)

Could you give me two tables with four columns of data? First column is the link, second column is existing status (red, blue redirect, blue non-redirect), third column is alternate-comma variant (with-comma for the without-comma table; without-comma for the with-comma table), and fourth column is status of variant. If you do this in a simple table, I should be able to copy/paste it into Excel, sort so I can easily remove everything I don't want, and move the remaining content back to MediaWiki. Nyttend (talk) 04:44, 20 January 2025 (UTC)

Ah, never mind, I've explored the links and just realised that there's a way to download a CSV with almost all this information. This should suffice. Thanks! Nyttend (talk) 04:47, 20 January 2025 (UTC)

@Nyttend: I've updated both queries to include the status of the variant titles. —Cryptic 05:00, 20 January 2025 (UTC)

Website Infobox parameter

Hi Cryptic, could you query me a list of articles that: Are part of {{WikiProject Academic Journals}}, have a {{Infobox journal}}, don't have a |website parameter? Thanks Nobody (talk) 18:00, 24 January 2025 (UTC)

There's no way to see whether they have a parameter. The usual solution is to add a hidden tracking category in the template. —Cryptic 18:51, 24 January 2025 (UTC)

Thanks for the reply, I've asked for it at Template talk:Infobox journal#Template-protected edit request on 24 January 2025, which was the reason for this request. Nobody (talk) 19:45, 24 January 2025 (UTC)

Ref count

Can anyone create/link me to a modified version of User:Bunnypranav/Reports#Probable "draftify because of no sources" candidates which has 1-5 refs (inline citations) only. I would appreciate if you could just add another section below the above link and create/test the query. Thanks in advance! ~/Bunnypranav:<ping> 13:50, 24 January 2025 (UTC)

References aren't stored in the database at all except in the page text, which isn't copied to the toolforge replicas. —Cryptic 14:37, 24 January 2025 (UTC)

Hmm. How is the no refs query made? Is there any way to change it from the existing no refs exists to x number of refs exist. From my very novice understanding of the existing query, it checks no of ext links, can we modify it to find a specific number of them for the requested query? Thanks for your help. ~/Bunnypranav:<ping> 16:32, 24 January 2025 (UTC)

It's looking at whether there are any external links, yes, and I'll go ahead and change the query. But it's going to get hardly any results either way - the only pages it finds even without limiting external links at all are Battle of Nalos (3 links) and Operation Qazanchi (6), I suspect by its requirement of a no-citations tag, and one specifically added through page-triage at that. —Cryptic 16:50, 24 January 2025 (UTC)

Thanks Cryptic for that change, that will help catch some more possibilities eventually. @DreamRimmer, since I see you have expertise in python scripts, could a py script help find the no. of citations tags for pages younger than 90 days and list them somewhere? ~/Bunnypranav:<ping> 06:28, 25 January 2025 (UTC)

I would say this is technically doable, but it's not advisable due to the large number of pages involved, roughly 50,000 to 60,000 pages created in the last 90 days. Checking the number of references for these pages would require making the same number of API requests, which would significantly waste resources. A more efficient approach would be to download the relevant database dumps and generate your list from that data. – DreamRimmer (talk) 07:40, 25 January 2025 (UTC)

Is there any way to limit the number of pages queried from the beginning? In any case, I am ready to atleast try to process the data dumps from my end, could you possibly help me with a py script since I do not know much to start from scratch. Up to you though and thanks for the reply! ~/Bunnypranav:<ping> 07:43, 25 January 2025 (UTC)

There's a tool in WP:AWB called the database scanner that may be helpful. –Novem Linguae (talk) 13:35, 25 January 2025 (UTC)

Noted, thanks! ~/Bunnypranav:<ping> 13:36, 25 January 2025 (UTC)

Twenty Year Society of Wikipedia editors

I'm curious as to how accurate a reflection Category:Members of the Twenty Year Society of Wikipedia editors is, of the number of editors still active who have been here for 20 years or more?

Can Quarry be used to measure editors who either:

Made edits at least 20 years apart
First edited over 20 years ago and are still (for some value, say: edited in the last three months) active?

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:27, 4 February 2025 (UTC)

The second is at quarry:query/90085. —Cryptic 19:21, 4 February 2025 (UTC)

What groups create what percentages of new pages?

Could I get some help generating the following queries? I plan to make some pie charts with the data.

Data for pie chart 1 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
- by autopatrolled users
- by non autopatrolled, non admin users
- by non autopatrolled, admin users
Data for pie chart 2 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
- by autopatrolled users
- by non autopatrolled editors with less than 10,000 edits
- by non autopatrolled editors with greater than or equal to 10,000 edits
Data for pie chart 3 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
- by autopatrolled users?
- by non autopatrolled editors who are not new page patrollers
- by non autopatrolled editors who are new page patrollers

Thanks. –Novem Linguae (talk) 21:53, 5 February 2025 (UTC)

@Novem Linguae I had to use my staff account to run queries against MediaWiki history, so I can't link the queries directly, and these should be taken with "Sam isn't an expert and might have missed something" caveats, but I found the following:

In 2024, 678,142 pages were created in the article namespace by users (not including bots). Of those, 170,684 are not currently redirects (they may have changed after initial creation, so this isn't precisely a reflection of how many non-redirects were created). The rest of the numbers will be non-redirects and also compared to this number. These numbers all exclude bots, too.
101,987 (60%) pages were created by non-autopatrolled, non-admin users.
756 (0.4%) pages were created by non-autopatrolled, admin users.
67,941 (40%) pages were created by autopatrolled users.
77,412 (45%) pages were created by non-autopatrolled editors with less than 10,000 edits.
25,331 (15%) pages were created by non-autopatrolled editors with greater than or equal to 10,000 edits.
100,929 (59%) pages were created by non-autopatrolled editors who do not have the patroller user group.
1,814 (1%) pages were created by non-autopatrolled editors who do have the patroller user group.

Sam Walton (talk) 23:25, 5 February 2025 (UTC)

Public version at ~~quarry:query/90509~~ quarry:history/90509/976754/947548. Can't figure out why my numbers disagree - I'm seeing about twice as many pages total - even considering I included pages created by non-users (either ips, or where the first revision had the user revdelled/suppressed, or conceivably bad imports). Percentages excluding those are broadly similar though. —Cryptic 00:11, 6 February 2025 (UTC)

Oh, it's because I didn't see the "non-redirects" added later. —Cryptic 00:13, 6 February 2025 (UTC)

Thank you both for your hard work. This data makes it clear that advocating for something like autopatrolling admins would have a negligible effect on reducing the WP:NPP queue (0.4% reduction). This is good data that will help inform some NPP-related decisions. –Novem Linguae (talk) 00:22, 6 February 2025 (UTC)

Frown. Now I'm seeing fewer pages than you - 195800 total currently-non-redirects, 107941 currently-non-redirects by users. And that's without excluding bots. Are you including currently-deleted pages? —Cryptic 00:25, 6 February 2025 (UTC) Nmind, I see what I did, duh. quarry:query/90509 again, now with much-less-disagreeable numbers - all somewhat higher than Sam's, since it still includes bots. Interesting that the percentages turned out close to right even though I was looking at the wrong (essentially random) users' groups. —Cryptic 00:46, 6 February 2025 (UTC)

I am actually kind of curious as to why an admin would ever not be autopatrolled. Are there admins who make poor quality new articles? BD2412 T 02:46, 6 February 2025 (UTC)

Something needed to be done to get more administrators, removing autopatrol was something, so removing autopatrol needed to be done. —Cryptic 03:00, 6 February 2025 (UTC)

I agree and I did not support the rfc, but I was in the minority. –Novem Linguae (talk) 06:28, 6 February 2025 (UTC)

@Cryptic Good to hear we're on the same track :) MediaWiki history includes since-deleted pages, so that will be another minor source of disagreement. Sam Walton (talk) 09:03, 6 February 2025 (UTC)

Then my numbers should be lower than yours, instead of ranging from 10-30% higher across the board. Including bots isn't it; there were only four bot "creations" in my dataset, Kansas City shooting, Pedro Campos (disambiguation), 10 Lives (disambiguation), and Chris Sheppard (disambiguation), all by User:RussBot; and two are bad data anyway, showing up only because they were history merged. (History merges, despite being half of that very small sample, aren't going to come anywhere close to accounting for the difference. Even considering that the other two of these pages were cut-and-paste moves and could stand to have their histories fiddled with too.) Is your data source looking at where pages were initially created, or where they are now? —Cryptic 11:25, 6 February 2025 (UTC)

@Cryptic Where they were initially created. The only caveat is the redirect issue I noted - the MW History dataset can say whether a page is currently a redirect, but doesn't store data on whether it was a redirect at the time of the edit. That would mean I'm undercounting based on pages which were created and then turned into a redirect some time between then and now, which might account for the difference? Sam Walton (talk) 15:37, 6 February 2025 (UTC)

No, my query also only sees the current redirect status. The difference is that mine also uses the current page location; while there's going to be some pages created in mainspace and moved to draft, there'll be many, many more created in draft or userspace first. These are probably the more relevant numbers, since those need to be patrolled too.

I suppose I could add in deleted pages that were last in mainspace, but it's going to be slow - there's no usable index on timestamp, and the one on namespace isn't very selective - and there's no way to see redirect status. —Cryptic 20:16, 6 February 2025 (UTC)

Again about New Pages

Following a January backlog drive, for a further research into improving NPP would it be possible to generate the following queries over a 1 month sample 1 to 31 Jan 2025? :

Number and % of new articles created in the article space by non-autopatrolled users, excluding redirects and dab pages
Number and % of new articles created by accounts that existed for less than 30 days and made less than 500 edits
Number and % of new articles excluding redirects and dab pages, that were still unpatrolled after the sample period
Number and % of new articles excluding redirects and dab pages, that got deleted within the sample period.
Number and % of new articles that were recreated from previously deleted titles.
and if possible, Number and % of new articles that were created from redirects.

Many thanks, Kudpung กุดผึ้ง (talk) 23:32, 6 February 2025 (UTC)

#4 (I've converted your requests from bullet points), at least, is going to be a problem - we can only see whether a currently-existing page is currently a redirect or dab. There's no access to the wikitext of either existing or deleted pages; for existing ones, there's a field that says whether it's currently a redirect, and dabs can be detected either through their categories or from a page property that's set on disambigs proper (though not set indices), but none of those exist for deleted ones.

The queries that don't mention excluding redirects and dabs, #2 #5 and #6 - was this omission deliberate or not?

For #2, I can compare article creation time directly to its creator's registration time, to see if the user had registered within 30 days of creation; but checking editcount can only reasonably look at the current editcount. Given the latter, should I look at current account age for consistency, or account age at creation for the partially increased accuracy? (The former's a lot easier for these particular numbers, since they happen to match up with the extendedconfirmed group.) —Cryptic 00:20, 7 February 2025 (UTC)

The queries that don't mention excluding redirects and dabs, #2 #5 and #6 - was this omission deliberate or not? That is correct. It was deliberate. For #4, Just the number and % of new mainspace articles - any kind - that were deleted, or flagged for CSD or AFD, woud be fine. For #2 whaterver is easiest for you. I'm looking to see by this to extrapolate a hypothesis of what it would have looked like if mainspace articles were onol created by ExtendedConfirmed editors.Thanks. Kudpung กุดผึ้ง (talk) 11:43, 8 February 2025 (UTC)

quarry:query/90585. —Cryptic 13:10, 8 February 2025 (UTC)

Thanks, Cryptic. Kudpung กุดผึ้ง (talk) 23:29, 10 February 2025 (UTC)

Wiley Open Access Portfolio

If you go to https://authorservices.wiley.com/asset/Wiley-Journal-APCs-Open-Access.xlsx, this is a spreadsheet of open-access journals from Wiley.

What I would like is a way to match DOIs with those journals. For example, the journal Advanced Electronic Materials has a list of issues here: https://advanced.onlinelibrary.wiley.com/loi/2199160x. Sampling a few years of articles reveal that all such articles have DOI that start with "10.1002/aelm"

I don't know if there's a way to query Wiley database for this, or if we can use Wikipedia (or dumps) to find our own matches in existing articles.

But the general desire is that for all Wiley journals in that excel sheet, we compile a list of DOIs, and find if there are matching initial patterns.

Headbomb {t · c · p · b} 13:02, 23 February 2025 (UTC)

Longest untouched user subpages

I would be curious to see a list of userspace subpages that have gone the longest without being touched. I suspect there are tons and tons of abandoned pieces of articles by long-gone users. BD2412 T 19:30, 14 March 2025 (UTC)

"Touched" has a specific but somewhat obscure meaning in this context; do you mean that, or - as I suspect - just time since last edit? (quarry:query/91716 for the latter.) —Cryptic 21:00, 14 March 2025 (UTC)

I mean last edit, yes, thanks. BD2412 T 21:37, 14 March 2025 (UTC)

Now I am wondering if this can be correlated against the length of time since the user's last edit, so we can see which are subpages from long-gone editors. BD2412 T 22:08, 14 March 2025 (UTC)

I've updated the query in-place. —Cryptic 23:47, 14 March 2025 (UTC)

Brilliant, many thanks. BD2412 T 00:54, 15 March 2025 (UTC)

Inactive autopatrolled users

Hi! Would it be possible to get a list of users who have WP:AUTOPATROL and have not made edits in the past 365 days? Thanks, House Blaster (talk • he/they) 06:37, 28 March 2025 (UTC)

quarry:query/92230. 1639 out of 4870, awesome. 917 of them have no edits in the last 5 years. Two have no edits ever. —Cryptic 08:28, 28 March 2025 (UTC)

... both of which are alt accounts of admins. More striking that the no edits ever cases is that the otherwise-most-inactive autopatrolled user is Nichalp, who somewhat infamously lost most of their permissions for undisclosed paid editing. * Pppery * _{it has begun...} 15:38, 28 March 2025 (UTC)

Yeah, that stuck out at me too. Also that they got autopatrolled and pcr almost a year and a half after their last edit, and almost exactly a year after all their permissions were revoked. —Cryptic 17:06, 28 March 2025 (UTC)

Unusually large redirect talk archives

I recently found out that when pages and their archives are moved but archive template is not updated, ClueBot III keeps adding talk sections below the old location of archive (which is now a redirect). See Special:History/Talk:2024 Kolkata rape and murder incident/Archive 1. I need a list of talk archives that are redirects but are also unusually large (>250 bytes), ordered by page size in descending order. Thanks! —CX Zoom[he/him] ^{(let's talk • {C•X})} 16:15, 2 April 2025 (UTC)

quarry:query/92487. —Cryptic 17:41, 2 April 2025 (UTC)

And quarry:query/92489 has a version that limits it to pages with at least one edit by one of the accounts in Category:Wikipedia archive bots. —Cryptic 18:02, 2 April 2025 (UTC)

...though I should've foreseen that one-click-archiver would've made that idea unhelpful. (Special:History/Talk:2020 Nagorno-Karabakh conflict/Archive 13, for example.) quarry:query/92490 instead looks for at least one edit with a summary containing "rchiv" and that wasn't made by User:AnomieBOT or User:Xqbot. —Cryptic 18:15, 2 April 2025 (UTC)

Thank you very much! —CX Zoom[he/him] ^{(let's talk • {C•X})} 19:15, 2 April 2025 (UTC)

Articles with a particular template missing a data element from that template

How would I create query that finds all articles with a template (e.g. "Infobox university") that are missing a particular data element in that template (e.g. "logo")? A couple scenarios I'm thinking of:

University articles missing the school logo
Album articles missing cover art
Musician articles missing image of person

TIA - hope this isn't too much of a newb question for this request page! — Preceding unsigned comment added by Widgetkid (talk • contribs) 16:48, 2 April 2025 (UTC)

In practical terms, you can't - template parameters aren't exposed to the database except as part of the wikitext, so you'd have to parse that yourself; and the text isn't copied to the database replicas, so you'd have to download a dump to do even that. Or grab the pages' text from the API, if you can narrow the set enough - Special:Whatlinkshere/Template:Infobox university has close to 27000 pages on it, which is about two orders of magnitude too many for that approach to work well. (Top of this page: Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references.) The usual solution is to change the template to add a tracking category. —Cryptic 17:51, 2 April 2025 (UTC)

Thanks @Cryptic! Can you tell me more about how the solution with adding a tracking category would work (or point me to an example)? I'm good with rolling up my sleeves to find a way to do this. 😀 ⚙️ WidgetKid 🙈🙉🙊 19:37, 2 April 2025 (UTC)

{{Infobox university}} in particular actually already does this - search its source for "Category:Pages using infobox university with the image name parameter". —Cryptic 19:40, 2 April 2025 (UTC)

Wikipedia:Request a query/Archive 5 Source: en.wikipedia.org/wiki/Wikipedia:Request_a_query/Archive_5

Links in common between two pages, with a twist

Redirects with Possibilities

First edit

Articles that are missing in another language

Find pages removed from a category

Composition categories without templates

Find duplicated file licensing templates

Heavy uses of Template:IETF RFC

AfD and article deletion statistics

Active admins

section headers starting with a lowercase "w" followed by a uppercase letter or a number

All categories with "LGBT" in their title

Typical page views

Cross Reference Request

submitted drafts

how many transclusions of a list of templates

Office Actions across all wikis

Uncategorized redirects

More redirect queries

Age of dormant accounts

List of all talk pages matching "Archives\s*\/\s*\d{1,3}"

List of articles likely to have one or no sources

Syntax error due to using a reserved word as a table or column name in MySQL

Dusty articles within all subcategories of a parent category

Number of articles that are actually articles

Median account age for EXTCON

List of Revision IDs by edit summaries

Draftifications by month

List of all file redirects that are in use in mainspace

Update to NPP reports

Measuring the number of source links to each domain for a given article/set of articles

Command denied

Lag, no results returned

Orphaned editnotices

Longest ref names

Australia Project

Number of editors per year

Talk pages

Serial commas in page titles

Website Infobox parameter

Ref count

Twenty Year Society of Wikipedia editors

What groups create what percentages of new pages?

Again about New Pages

Wiley Open Access Portfolio

Longest untouched user subpages

Inactive autopatrolled users

Unusually large redirect talk archives

Articles with a particular template missing a data element from that template

List of all talk pages matching "Archives\s\/\s\d{1,3}"