This is an archive of past discussions on Wikipedia:Link rot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page.
deseretnews.com
Almost all links here are mapped redirects to articles at deseret.com, but conversion seems to be intractable, so the links should be archived. The converted links are of the form www.deseret.com/year/month/day/<id>/title-of-article, where the <id> seems to be unrelated to anything in the old link. Example: link [1] in 2012 United States presidential election is a mapped redirect to [2].
Fox News articles of the form foxnews.com/<section>/yyyy/mm/dd/.... are mapped redirects to articles of the form foxnews.com/<section>/title-of-article. Example: [3] in "Weird Al" Yankovic is a mapped redirect to [4] (note that the text at the end of the first URL differs from that of the second, with "adapting" apparently misspelled in the first). Conversion is usually tractable so long as the article title is known, as it is similar to the Chicago Tribune conversion.
Looks like two types of conversions: a simple URL transform by removing the date; and the harder "Chicago method", of extracting the title from the citation. I guess the best way is try to simple method first and if not then the Chicago method; if those do not work then check for ghost redirects; and finally add an archive. -- GreenC15:59, 8 November 2024 (UTC)
It's working, but took a while to code as this is the first time I've attempted sequencing all the methods at once. The "Chicago" method is still pretty custom, I need to integrate it as part of the boilerplate code as a standard feature. Also with all these methods it's slow, 7,000 pages will take a while. -- GreenC19:58, 8 November 2024 (UTC)
I added two new concepts to the glossary: ruled mapped redirect, and inferred mapped redirect. In this case, the removal of the date from the URL is a 'ruled mapped redirect' ie. a hard-coded rule to transform the URL. The parsing of the title is an 'inferred mapped redirect' because it is inferring (guessing) what the new URL might be, and could generate multiple guesses into an 'inference table', from which the bot checks each guess, until it finds a match. The inferred mapped redirect code is now incorporated as a feature that can be enabled/disabled for each project. -- GreenC06:26, 9 November 2024 (UTC)
Helpful Raccoon, thanks for finding and reporting Fox News, it was helpful on a couple levels. Fixing the links, improving the bot's general code for future domains, and helping to distinguish (or at least name) the concepts of 'ruled mapped redirects' and 'inferred mapped redirects'. -- GreenC15:14, 10 November 2024 (UTC)
OK. Some redirect some do not. I'll test them all and migrate the ones that redirect. It increased the search size, since it's also including anything with only an ID number. -- GreenC16:24, 10 November 2024 (UTC)
Enwiki
Checked 1,654 pages and edited 1,491 pages. Moved 1,492 links to a new URL: 1,389 ruled mapped redirects, 103 ghost mapped redirects. Resolved 22 soft-404s. Removed 1 {{dead link}}. Added 140 {{dead link}}. Switched 107 |url-status=dead to live. Switched 10 |url-status=live to dead. Added 142 archive URLs (114 Wayback). Changed 305 citation metadata.
There are 2,943 articles that match this description: per this search result.
I tried this with several links and it seemed to work fine. I'm not sure how many failed the transfer, but testing a bunch and it being fine seems to me like a lot of them still exist.
The Drake link was moved here. The number "1798170489" is the key. I was able to find it in a ghost redirect as seen here (the old URL redirects to the new URL). It will be a while, I need to get through everything else above first. Looks like about 4,600 pages. -- GreenC17:55, 30 October 2024 (UTC)
First pass: Checked 4,601 pages and edited 2,924 pages. Moved 3,133 links to a new URL: 3,133 ghost mapped redirects. Switched 120 |url-status=dead to live. Added 73 archive URLs (26 Wayback). Changed 770 citation metadata.
Second pass: Checked 2,607 pages and edited 1,751 pages. Moved 3,493 links to a new URL: 468 inferred CDX mapped redirects, 3,025 ghost mapped redirects, Added 9 {{dead link}}. Switched 32 |url-status=dead to live. Switched 115 |url-status=live to dead. Added 1,199 archive URLs (1,067 Wayback). Changed 213 citation metadata.
Analysis: created a new method for discovery: inferred CDX mapped redirects. Converted domain names *.xvclub.com to www.avclub.com. Improved ghost redirect detection
IABot DB
Updated about 11,000 links that propagate to 300+ wikis
I processed time.com in July 2021. It was large, took three days to process. Added 25,000 archive URLs. You can read my strategy in the link. Do you still see a lot of broken links without archive URLs? -- GreenC01:07, 11 November 2024 (UTC)
Of the first 500 in the above link, 194 don't show archives. If you could filter out the ones without archive URLs for time, it'll help a lot. MrLinkinPark333 (talk) 01:11, 11 November 2024 (UTC)
How are you checking for archives? 194 is about 40%. I just manually checked 50 pages, every one has an archive (need to open the page and search on the link, the search result page doesn't provide enough information to determine). Except 3 cases that have a live link. Of those 50, in no cases would the bot add an archive URL. I could do this, but it will take a while to process, and I'm not sure how much it will accomplish. BTW the Paul McCartney example link no longer exists in the article, but it does exist in two others. Both have archives. -- GreenC19:36, 11 November 2024 (UTC)
I only checked the results page and not manually checked each individual article. Is it possible to adjust the search result link above to calculate how many articles don't have archives first for time? Then, we could decide what to do next. MrLinkinPark333 (talk) 19:43, 11 November 2024 (UTC)
There is no easy way for this search. But recall Wikipedia:Link_rot/URL_change_requests#ctv.ca, which was also previously done in 2021, and it found 133 more archives. Maybe it's worth trying again. I'll need to build a list of target articles by searching a dump file, since the online search tops out at 10,000 results. -- GreenC05:06, 12 November 2024 (UTC)
If you believe this is easier, feel free to check all of them. Since this request is big, I don't mind if it gets done later after the smaller requests are done. MrLinkinPark333 (talk) 02:16, 13 November 2024 (UTC)
Extracting all the page names that contain time.com requires searching a dump file which can take 6-8 hours to complete. This is required when the number of results is > 10,000 because Cirrus search (eg. "insource:..") won't return more than 10k results, due to resource constraints on their search server. Cirrus can return how many results there are > 10k, but won't display the actual results beyond 10k. I'll need to do the same with deadline.com below which has 40k results. -- GreenC19:46, 15 November 2024 (UTC)
Enwiki
Checked 44,901 pages and edited 13,920 pages. Moved 14,455 links to a new URL: 14,074 ruled mapped redirects, 381 ghost mapped redirects, Resolved 7,124 soft-404s. Removed 9 {{dead link}}. Switched 660 |url-status=dead to live. Added 740 archive URLs (446 Wayback). Changed 2,281 citation metadata.
Analysis: almost all 'ruled mapped redirects' are http -> https. Since 'ghost redirects' were not available in 2021, they were discovered this time. Most of the archive URLs were non-Time.com domains that had a {{dead link}} tag and repaired incidentally. It was able to convert many |work=time.com to |work=Time, because this feature did not exist in 2021.
There are over 40,000 pages with deadline.com .. limit to www.deadline.com there are 4,780. This is what I am checking on "Pass 1". -- GreenC17:16, 15 November 2024 (UTC)
Enwiki
Pass1: Checked 4,784 pages and edited 4,364 pages. Moved 6,575 links to a new URL: 6,575 ruled mapped redirects, Added 24 {{dead link}}. Switched 98 |url-status=dead to live. Switched 143 |url-status=live to dead. Added 442 archive URLs (401 Wayback). Changed 1,295 citation metadata.
Pass 2: Checked 39,245 pages and edited 5,808 pages. Moved 2,278 links to a new URL: 2,278 ruled mapped redirects, Added 95 {{dead link}}. Switched 32 |url-status=dead to live. Switched 1,119 |url-status=live to dead. Added 2,126 archive URLs (2,018 Wayback). Changed 2,399 citation metadata.
Their former URLs paleodb.org and fossilworks.org have been taken over by The Ecological Register; a seemingly well-meaning site. The old URLs such as:
Dead sub-domains. Can be made live again by converting hostname to "www." .. the hostname might be: origin|games|music|film|news|aux|tv|mobile .. 4,732 pages -- GreenC21:22, 13 November 2024 (UTC)
Enwiki
Checked 4,742 pages and edited 4,546 pages. Moved 5,181 links to a new URL: 5,156 ruled mapped redirects, 25 ghost mapped redirects, Removed 3 {{dead link}}. Switched 278 |url-status=dead to live. Added 31 archive URLs (22 Wayback). Changed 60 citation metadata.
IABot DB
Checked and updated about 3,500 links that propagate to 300+ wikis
An ideal transition seems difficult as it would require the following steps:
Find an archived version through the wayback machine, e.g., https://web.archive.org/web/20240713231341/https://nztop40.co.nz/chart/albums?chart=3467 for the above. For case 2 this requires inferring the URL first (https://nztop40.co.nz/chart/{{#switch:{{{type|}}}|album={{#if:{{{domestic|}}}|nzalbums|albums}}|compilation=compilations|single={{#if:{{{domestic|}}}|nzsingles|singles}}}}?chart={{{id|}}}))
Harvest the date 11 August 1991 either from the rendered archived page or from the archived page source, <p id="p_calendar_heading">11 August 1991</p>
For case 2, add |source=newchart and replace |id=1991-08-11.
Note that for case 1, the word after "/archive/" changed according to the following incomplete table. For case 2 this is handled by the template so no need to worry about it.
Old text
New text
albums
albums
singles
singles
nzalbums
aotearoa-albums
nzsingles
aotearoa-singles
tereosingles
te-reo-singles
hotsingles
hot-singles
hotnzsingles
hot-aotearoa-singles
If someone is willing to go through the above, at least for simple cases, I think it is the ideal solution, especially for case 2. Failing that, a simpler archiving procedure can be taken.
For case 1: add |archive-url= and |archive-date= per usual archiving procedure. Add |url-status=deviated. If no archive exists (which should be a minority), add {{dead link}}
For case 2: add |archive-url= and |archive-date= per usual archiving procedure as they are supported by the templates. Add |source=oldchart (even if no archive is found)
Muhandes, I don't see any major hurdles with your ideal solution. It's a lot of citations, worth doing. I'm working through requests on this page chronologically. Might get to here in a week or less. -- GreenC00:51, 15 November 2024 (UTC)
@GreenC: I'm happy to hear that. In the meanwhile I added records to the table above which should make it complete, to the best of my knowledge. I also noticed some of the URLs (53 of them to be accurate) add an additional #all_records_extra to the URL, e.g., https://nztop40.co.nz/chart/albums?chart=4413#all_records_extra. I will have a look at them individually and perhaps, since it's only 53, do them manually. --Muhandes (talk) 08:18, 15 November 2024 (UTC)
The pages using #all_records_extra were are all referring to the Heatseeker charts which don't seem to be available on the new website. As such, they should be archived, not translated to the new format. --Muhandes (talk) 10:32, 15 November 2024 (UTC)
Case 1 and 2 are different code bases. I have a separate code file for working with external link templates. So I'll initially focus on case 1, then likely some of that code can be reused with case 2. -- GreenC15:07, 19 November 2024 (UTC)
To document an additional variation, the "End of Year" charts, like this, which have new URLs like https://aotearoamusiccharts.co.nz/archive/annual-{newcode}/{e}-12-31, where {newcode} is in the HTML search on "<h1>Top Selling [name]" where [name] could be Singles, Albums, NZ Singles, NZ Albums, Compilations - then extrapolate from the chart above. The "{e}" is the year taken from <p id="p_calendar_heading">...</p> -- GreenC21:11, 19 November 2024 (UTC)
@GreenC: the "discover" charts are the same Heatseeker charts as the #all_records_extra ones. As far as I can tell they are no longer available. The only way to handle it is to find an archive-url. Note that in these cases the oldest archive-url is the best. I have found several cases where a new archive exists but it does not include the chart itself. Muhandes (talk) 23:45, 19 November 2024 (UTC)
Logs: Wikipedia:Link_rot/Cases/nztop40.co.nz. The templates from Case 2 will show up in the tracking category. If there is no archive URL available it won't be able to make the conversion, and likewise won't be able to add an archive URL. Some archive URLs are available, but are soft-404s, or the original URL was not a valid chart page, or the template is malformed. I'll provide a list of the templates that didn't convert, so you can scan for syntax errors; the process is still running. -- GreenC16:17, 20 November 2024 (UTC)
The logs show network failure .. likely Wayback Machine time out (I check for timeouts and have retries but at some point it gives up). I just tried it again, worked first try. I'll rerun the cases that didn't convert. -- GreenC20:24, 21 November 2024 (UTC)
For case 1: Re-ran the 249 pages in Wikipedia:Link_rot/Cases/nztop40.co.nz (first two lists combined) and had only 1 new result. This leads me to believe that while running case 2 originally, there were intermittent problems with the Wayback Machine, during that period. If you see anything else it missed let me know and I'll investigate. -- GreenC22:20, 21 November 2024 (UTC)
'&' character in the template not percent encoded, which caused an API request to return incorrect results.
Certain difficult citations: Grease (1978 soundtrack): {{Certification Table Entry|region=New Zealand|type=album|title=Grease Soundtrack|artist=Various|award=Platinum|number=6|id=5383|salesamount=250,000|certyear=2022|relyear=1978|access-date=21 August 2022|salesref=<ref>{{cite web|url=https://www.americanradiohistory.com/Archive-Billboard/70s/1979/Billboard%201979-03-17.pdf|title=Tax Clouds Growth And Dampens Local Talent Development|publisher=Billboard|page=SA-6|first=Phil|last=Gifford|date=17 March 1979|access-date=31 July 2019}}</ref>}}
Well, Wikipedia follows the 80/20 Rule. It's sort of like climbing Mt. Everest without oxygen. The first 80% is easy. The next 10% is hard. The last 10% is as hard as the previous 90% combined. This is why many people give up once it gets to 90% (or around there) without reaching 100%. The work gets exponentially difficult.
To answer your question about the date offset, I'm embarrassed to say there is a typo in the code, converting "August" to "09", instead of "08". The site then redirected the bogus date page to a working page nearby, September 13. So I never caught it. This is unfortunately the case for everything with an August month. There are about 840 citations in 750 pages that would possibly be a problem.. probably about half that since some are legitimate September dates. This will be tricky to fix. I keep logs with old -> new template data that make it possible, for this sort of regression situation.
'Certification Cite Ref', if you can give me the template format it seems to use different parameters. -- GreenC03:17, 25 November 2024 (UTC)
Found and fixed the August error: 343 citations in 320 pages. Example. The edit counts in the edit summary are not always accurate due to the way it was done. Ran the CCR template. It only edited 15 pages, but it got the three pages you mentioned, so I suspect it's probably accurate. -- GreenC19:53, 25 November 2024 (UTC)
I finished cleaning up the category. I may deal with the rest of the cases at a later date. Anyway, I believe the bot's work is done. Thank you! Muhandes (talk) 19:12, 3 December 2024 (UTC)
User:Muhandes: Congrats! Nice to see your dedication to reach 100%. This project required new bespoke code that of course had some bugs on the first/second try but you kept error checking it and narrowed the numbers down to something manageable so the rest could be done manually, which is admirable work. My boilerplate code is well tested, but novel situations like this are often how the boilerplace code gets new features added. Although I've never seen anything like this before, I'll keep it mind in case the pattern comes up again. -- GreenC17:57, 10 December 2024 (UTC)
Enwiki
Case 1: Checked 8,904 pages and edited 8,870 pages. Moved 17,224 links to a new URL: 17,224 ruled inferred mapped redirect, Removed 1 {{dead link}}. Added 142 {{dead link}}. Switched 313 |url-status=dead to live. Switched 38 |url-status=live to dead. Added 269 archive URLs (219 Wayback). Changed 8 citation metadata.
The links to currentaffairs.org have been changed. They used to be just:
currentaffairs.org/yyyy/mm/article-name
But have now changed to:
currentaffairs.org/news/yyyy/mm/article-name
At the moment most of the links are being redirected from the old URLs to the new ones. -- LCU ActivelyDisinterested«@» °∆t°18:43, 30 November 2024 (UTC)
Hello. Old urls with only numeric IDs don't work, such as this link for Bruno Mars. I haven't seen replacement URLs on the website. Therefore, I request archives for these URLS, unless new links can be found. ~2500 pages. Some already have archive urls added. Thanks! MrLinkinPark333 (talk) 19:21, 6 December 2024 (UTC)
(The 624 represent URLs that were http:// converted to https:// (a ruled mapped redirect) and at the same time a normal redirect was found and followed and converted to a live link. Because so few archives were added it appears the domain was previously processed converting to archives, though not by WaybackMedic)
IABot DB
Updated about 6,000 URLs which propagate to 300+ wikis
There are only about 240, probably worth doing, but I bet this pattern: name.com/string/string/0, can be found throughout. -- GreenC17:52, 17 December 2024 (UTC)
While doing Sports Illustrated below, I found a recently introduced bug that would explain why so few archive URLs were added. I'll need to reprocess the enwiki of people.com -- GreenC18:41, 17 December 2024 (UTC)
Pass 2 (bug fix): Checked 2,577 pages and edited 1,493 pages. Moved 19 links to a new URL: 1 normal redirects, 8 ruled mapped redirects, 10 ghost mapped redirects, Resolved 60 soft-404s. Added 12 {{dead link}}. Switched 5 |url-status=dead to live. Switched 449 |url-status=live to dead. Added 1,349 archive URLs (1,215 Wayback). Changed 9 citation metadata.
These links might be able to convert to new links at si.com. The new URL format is vault.si.com/vault/year/month/day/name-of-article/ - For example: this link is now here for Kenny Anderson (basketball). However, it won't always work as this is now here for Guus Hiddink. As some new URLs also have the subtitle, I suggest trying to convert with the headline only first, then add the subtitle if that doesn't work. Otherwise, I request regular archives if converted URLs aren't found. ~800 articles. Thanks! MrLinkinPark333 (talk) 20:07, 6 December 2024 (UTC)
Enwiki
Checked 829 pages and edited 251 pages. Moved 254 links to a new URL: 254 inferred mapped redirects, Resolved 496 soft-404s. Removed 1 {{dead link}}. Added 5 {{dead link}}. Switched 195 |url-status=dead to live. Added 16 archive URLs (4 Wayback). Changed 40 citation metadata.
Most of the domain names for The Guardian redirect to new links. For example, this goes here for Art criticism. However not all of them work. Here's what I've found so far:
It will likely be > 10k (Cyrus maxes at 10k). I can search a dump file for pages that contain *.theguardian, and the bot will internally skip www and <none> links. Also there are over 2000 pages with amp (mobile optimized) to be converted to www -- GreenC03:03, 20 December 2024 (UTC)
OK. The original request will be redirects, and archived mapped redirects (ghost). The amp will be ruled mapped redirects. Maybe some ruled inferred mapped redirects are possible. Anyway I need to finish WP:JUDI batch #20 first, it's larger than all previous JUDI batch's combined, will require a bunch of runs due to size limits. -- GreenC03:52, 20 December 2024 (UTC)
I thought of a different way to find them, is pretty simple and better. For example find all pages s.theguardian then repeat for each letter in the alphabet, skipping "w" (www) and "p" (amp). The end result is 104 articles. -- GreenC15:18, 10 January 2025 (UTC)
Enwiki
Checked 104 pages and edited 52 pages. Moved 38 links to a new URL: 3 normal redirects, 35 ruled mapped redirects, Resolved 12 soft-404s. Added 13 {{dead link}}. Switched 2 |url-status=live to dead. Added 4 archive URLs (0 Wayback). Changed 23 citation metadata.
This is to correspond to a system shutdown January 2, 2025 (context). The documents have all been mirrored by a third party. This requires regex to capture a one- to six-digit imported letter ID. Also include http:// versions of these URLs.
2,729 pages. I'll run this soon ahead of the queue due to the imminent shutdown, so any missing archives might be archived. -- GreenC18:44, 25 December 2024 (UTC)
I was unable to get to this before the deadline, but was able to convert all but 21, and not all of those are the same type of URL. -- GreenC18:55, 5 January 2025 (UTC)
Checked 2,721 pages and edited 2,713 pages. Moved 3,222 links to a new URL: 4 normal redirects, 3,217 ruled mapped redirects, 1 ghost mapped redirects, Removed 1 {{dead link}}. Added 21 {{dead link}}. Switched 10 |url-status=dead to live. Switched 10 |url-status=live to dead. Added 131 archive URLs (127 Wayback).
IABot DB
Updated about 1,200 links which will propagate to 300+ wikis (note: conversions to archive URLs as mapped redirects are not supported by IABot at this time)
There are two parts of the redirect: the domain, and the path. The domain kicker.ch is Switzerland. When I open a link to kicker.de (I'm in the USA) it stays at kicker.de .. it appears the site is location-aware and redirects the domain based on your location, for some countries. The path also redirects, and that appears to be the same in .de or .ch .. I think the safe thing is keep kicker.de but change the path. -- GreenC22:08, 12 January 2025 (UTC)
This site has a complication, some links are "crunchy 404", a page that is partly correct and partly wrong. For example this is supposed to go to the 2009-10 season, but it redirects to the current season with drop down menus to find older seasons. In this situation changing the URL to the redirect results in a loss of information in the URL and citation. So what I have done is identify when URL redirects lose information, and leave the existing URL alone, neither change to the redirect URL, nor add an archive URL. When users click the link it will just follow the natural redirect. If the link or redirect ever stop working, the existing URL will inform them what they are trying to find and be able to repair it with a new link. -- GreenC15:51, 13 January 2025 (UTC)
They probably removed older seasons and just redirected to the current one, as long as the current URL works or a archive link is available it shouldn't have an impact. Nobody (talk) 16:16, 13 January 2025 (UTC)
Enwiki
Checked 6,795 pages and edited 5,771 pages. Moved 23,280 links to a new URL: 3,503 normal redirects, 19,671 ruled mapped redirects, 106 ghost mapped redirects, Resolved 2,127 soft-404s. Removed 4 {{dead link}}. Added 22 {{dead link}}. Switched 713 |url-status=dead to live. Switched 100 |url-status=live to dead. Added 3,352 archive URLs (3,045 Wayback). Changed 11,904 citation metadata.
Hello. A couple of months ago your bot tagged a load of links to singapore-elections.com as usurpsed and changed the link to a Web Archive one (e.g. here). The website has moved to sg-elections.com and otherwise the urls are unchanged. Is there a way to replace all the web archive links/usurped tags with the new urls? The site was quite widely used as a source. Cheers, Number5701:31, 12 January 2025 (UTC)
I'll need to program a rule "parl-YYYY-ge" --> "general-election/YYYY" .. there will be other rules. Can you help find more rules? The links still exist on about 70 pages. I completed 256 links below in Pass 1. -- GreenC22:20, 13 January 2025 (UTC)
@GreenC: Any citation with the date parameter between January 21, 2021 and January 19, 2025 should be changed over (January 20 will have an overlap between administrations so probably should be done manually) --Nintendofan885T&Cs apply10:48, 26 January 2025 (UTC)
Whitehouse.gov is not fully populated yet. For example there is nothing for [10] (Office of Management and Budget), the largest office in the Executive branch the federal budget (Re-homed to X.com ?)
This is a dynamic site. For example there is a link from 2015 (Obama) that was deleted by Trump (2017) that was restored by Biden (2021) that was deleted by Trump (2025). The bot was able to follow and restore to the version at obamawhitehouse.archives.gov which should be permanent. Another link was active during Trump term 1, Biden kept it active, then Trump term 2 deleted it. The bot restored it to trumpwhitehouse.archives.gov (trump term 1).
Links have multiple migration paths: Still works; redirect to a soft 404 at whitehouse.gov; a missing redirect to whitehouse.gov; redirect to a working page at archives.gov; redirect to a soft-404 at archives.gov; missing redirect to archives.gov; dead links needing an archive URL.
No, because they are not web archives. The |archive-url= is for web archives. The list of available web archive providers is Wikipedia:List of web archives on Wikipedia. The whitehouse.gov are source URLS which have moved to a new location at the National Archives, where they are then archived into the Wayback Machine. So if the National Archives location dies, we can add a web archive URL for it. This is a common source of a confusion, even though it contains "archive" somewhere in the URL, it is still the source location, which itself can have web archives. -- GreenC16:16, 1 February 2025 (UTC)
Deprecating "soft-redirect" term
I've updated Wikipedia:Link_rot#Glossary to deprecate the term "soft-redirect". It has ambiguity with other meanings, and there is existing terminology for this concept: "mapped redirect" and "missing redirect". -- GreenC23:06, 1 February 2025 (UTC)
Site seems to have been usurped by spam. I've fixed a couple of links manually to point to archive but it looks like there are a couple dozen other articles that refer to this site. BoredPenguin (talk) 02:13, 30 January 2025 (UTC)
The template has been updated, but the species id parameter needs to be manually(?) updated from # to #/# to reflect the new folder structure.
The id can be determined by searching for the species name at [13] , and copying the numbers from the url, e.g. on the lesser yellowlegs page {{IUCN_Map|22693235|Tringa flavipes}} has been updated to {{IUCN_Map|22693235/208218115|Tringa flavipes}}
Manually searching 500 times is probably days or weeks of work. Do you have an API token, or can get one? [14] Then we can automate. The URL would look like this but it current says "Forbidden" without an API token (registration). More API endpoints here. Not sure which one provides the correct information. -- GreenC23:18, 23 January 2025 (UTC)
Thanks for the reply! I have no particular knowledge of the site -- just trying to fix a dead link... I believe the purpose of the template is just to create a link to a webpage with an interactive map, so I'm not sure the API would help... it's possible the save option for the advanced search would give a table with species name and url that could make some sort of automating possible, but downloading the results requires logging in (which is beyond my personal commitment level at this time). Random fixer upper (talk)
Solution: with the same example, load this url [15] which searches the WaybackMachine for every URL https://www.iucnredlist.org/species/22693235/* (wildcard at the end). It gives a bunch not correct, one correct. For each, check the HTML source for the line:
<meta content='IUCN Red List of Threatened Species: Tringa flavipes' name='citation_title'>
.."Tringa flavipes" we are looking for. It's a match, so we know the second number is correct ie. 22693235/208218115 .. that's it, as far as building the map. From there it is plumbing to fix the template. I can do this, but need to get through previous projects first, including Whitehouse.gov. This is a fairly rare scenario, I call it a "Ruled inferred mapped redirect". -- GreenC02:43, 24 January 2025 (UTC)
Random fixer upper, I was able to convert 370, unable to convert 57; conversion rate 87% (80/20 Rule). Rather than upload the conversions and discover there are errors, I posted 25 examples to Wikipedia:Link_rot/Cases/iucnredlist.org. Can you verify it looks OK, before I make the changes? Column 2 is the old and Column 3 is new. I also posted the 57 that require manual conversion. -- GreenC19:48, 3 February 2025 (UTC)
I see a problem: with this, it says "not latest assessment". The message is generated by JavaScript, which is invisible to web scraping, nor does it show up in a headless browser. The correct page is this. The best solution I come up is sort the ID digits numerically and choose the one with the largest number, under the assumption they are assigning the numbers chrono and thus the largest number will be the latest page. This may or may not work out, it's the only solution I can think of. I'm currently reprocessing and will post updated sample results soon. -- GreenC18:25, 4 February 2025 (UTC)
"Try 3" is a clean sweep. I'm going to call this problem solved and upload the diffs. This was an interesting project: exploring WaybackMachine CDX records to find possible codes, sorting those codes (tricky) into an inference table, and web scraping for titles strings. -- GreenC21:05, 4 February 2025 (UTC)
David O. Johnson: It appears the site has many soft-404s. For example this (2012 Georgia elections) redirects to this (2004 Bush administration). I'm doing my best to find and convert these to dead links with archive URLs, but it's imperfect. The bad ones mixed in randomly no pattern. It's a poorly maintained website, which is not uncommon, but worse than usual because of a random and high number of false redirects. I already committed some onwiki without realizing it. I could fix many more legit ones perhaps another 3 or 400, but without manually reviewing each, I can't safely build a redirect map. -- GreenC19:52, 10 February 2025 (UTC)
The domain www.uptheposh.com has been usurped, and all links (including sublinks like http://www.uptheposh.com/people/580/, http://www.uptheposh.com/seasons/115/transfers/) now redirect to an Indonesian gambling site Nina Gulat (talk) 16:43, 4 January 2025 (UTC)
This site died in 2012, until 2019 it went to Justhost, since then if you look at at archived page on wayback a file gets downloaded onto your computer. Putting a url directly into browser search brings up a squatter search site but with a url beginning ww3. Refill has in the past changed the cite url to one beginning ww1. Think it's safest to just archive and usurp the lot Lyndaship (talk) 12:50, 9 January 2025 (UTC)
This used to host the public archives of the Swedish city Göteborg. The archives moved to a new address starting from 2019. It is now displaying casino ads.
Links to tornadohistoryproject.com (a widely used source run by the Storm Prediction Center less than a decade ago, especially on older articles written before 2015) now redirect to an unaffiliated third-party essay writing service. Links should be considered usurped and dead where no archive URL is available.
Links to crh.noaa.gov (individual National Weather Service WFO summaries for severe weather events) are dead, but many still remain online on the new weather.gov domain.
For instance, http://www.crh.noaa.gov/dlh/?n=1991halloweenblizzard can now be found at https://www.weather.gov/dlh/1991halloweenblizzard - the syntax is different but can be reasonably changed and the site contents are the same. I'm not sure why crh.noaa.gov isn't a redirect but regardless it's still used in a hell of a lot of weather articles and should be salvaged rather than just labeled as dead where possible. I think this also extends to other domains - if I'm not mistaken, crh is Central Region Headquarters, and there are likely others in the South and a few other parts of the country. Departure– (talk) 14:21, 30 January 2025 (UTC)
This is tricky because if I replace a dead link with a new live link, but the live link has different content, the old dead link is lost we don't know what the original link was anymore. But if I keep the old dead link, plus add an archive URL, then the content is preserved. Thus the safer option is to add archives. Like the DenverSummerHeat example could be converted to this which is serviceable. -- GreenC01:30, 7 February 2025 (UTC)
According to the WaybackMachine it was last available on January 1 2025. That's an ominous date, first of the year, suggesting cut off. But it may be too soon to determine, I've seen sites disappear then return months or years later. In the mean time we have dead links. It's only 247 pages. There's nothing for the .in version. Well, technically speaking I can move cites from live to dead, then dead to live again. Recommend treat it as a dead site now, and if returns to the living, reinstate it. -- GreenC05:19, 8 February 2025 (UTC)
Following an RfC, www.heritage.org has been blacklisted for being a cybersecurity risk. All URLs in citations should be archived and switched to url-status=unfit so that users don't accidentally click malicious links. Nemo10:00, 12 February 2025 (UTC)