User:The Anome Source: en.wikipedia.org/wiki/User:The_Anome

Twenty Years Society (2019, square edit).pngThis user has been editing Wikipedia for more than twenty years.

The Anome is a second-wave Wikipedian.

The Anome abides.


Interesting reading[edit]

To do[edit]


Put {{articles-missing-coordinates category}} into every category of that category tree, removing the templates it makes redundant.  Done


Work in progress[edit]


  • Modify Template:Taxobox/core to detect if images are completely missing (none of image, image2, image_upright, image2_upright), and to add a category "Taxobox without images", or perhaps "$ORDER taxobox without images", where $ORDER is taken from the taxonomic hierarchy.


Wikidata: knots and links[edit]







  • Sort mangling of double-blank-line-spaced paras on tag insertion  Done
  • Sort mangling of double-blank-line-spaced paras on tag removal
  • Add Category:Townlands of Northern Ireland by county to the bot input list  Done
  • Rewrite the Anomebot's internal API access library to use Python's "requests" HTTP library under the hood, to allow the use of persistent HTTPS connections
  • Fix bot exclusion stuff: complex spec, painful to implement -- do a simplified version that just detects the {{bot}} tag
  • OSM semantic bridge: see https://github.com/OSMBrasil/semantic-bridge/tree/master/src




Short descriptions[edit]



Custom political navbox colors[edit]

See Category:Navboxes using background colours and Category:Political ideology templates: PetScan

See also: Category:Navboxes using background colours and Category:Political party templates by country: PetScan

Python code to remove color styling from navbox template wikitext: User:The Anome/rmcolors.py

Discussion: Wikipedia_talk:WikiProject_Templates#Advertising_colors

Articles for creation[edit]

Articles for:


Articles for splitting[edit]

Geodata to-do[edit]

Next steps[edit]

{{coord missing}} now assigns pages with to

depending on the Wikidata entity's P625 property.

Next step is to start using data from this for the bot.


Quality control[edit]

  • Look into autodetection of low-resolution geocoding of fine-grained objects (villages, buildings, landmarks...). Ping User:Abductive.
  • Investigate why the matcher did not find the Okpilak River, surely a slam-dunk match?


  • Harvest unused lat/long data from {{infobox settlement}}, and replace with {{coord}}: see here for monitoring script.

    Not many articles have this, so this is likely to affect a couple of hundred of articles at most. Still, every little helps.

  • Monitor Category:Pages with malformed coordinate tags
  • Why do Republic of Dagestan etc. articles escape the {{coord missing}} sorter?
  • Possible low-hanging fruit for geocoding: the following categories have thousands of non-geocoded articles that are not getting matched by my current software, and may benefit from special-purpose matching heuristics:
    • Category:Brazil articles missing geocoordinate data (was 3000+ articles, now 2,478 as of 2015-03-25) -- ??
      • Note: most of these appear to be rivers -- just matched 500+ of these by translating GNS names
    • Category:Iran articles missing geocoordinate data (13,000+ articles) -- transliteration problems, presumably
      • It looks like a lot of this might be repetition of the same location in multiple places: the bot's code gets 7000+ multi-matches for Iran
      • See also this paper: "Cross linguistic name matching in English and Arabic: a "one to many mapping" extension of the Levenshtein edit distance algorithm" in Freeman, A. T.; Condon, S. L.; Ackerman, C. M. (2006). "Cross linguistic name matching in English and Arabic". Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics: 471. doi:10.3115/1220835.1220895.
      • And this: A verified Arabic-IPA mapping for Arabic transcription ... http://eprints.whiterose.ac.uk/79653/1/brierley14jss.pdf
      • And this: http://geonames.nga.mil/gns/html/romanization.html
    • Category:Pakistan articles missing geocoordinate data (~3500 articles) -- transliteration problems, presumably
      • Note: 700+ multimatches from bot code
    • Category:Philippines articles missing geocoordinate data (2000+ articles) -- ??
      • Note: mostly universities, schools, other locatable organizations, very little here looks bot-matchable.
    • Category:Romania articles missing geocoordinate data (7000+ articles)
      • Note: apparently mostly rivers
    • Category:South Korea articles missing geocoordinate data (1700+ articles) -- not sure what's going on here: fixed my FIPS 10-4 mapping, but that doesn't go very far towards fixing the problem
      • Note: insignificant number (< 100) of multimatches
      • This may be a matter of transliteration: McCune–Reischauer vs. Revised Romanization
    • Category:Turkey articles missing geocoordinate data (5000+ articles) -- lots of places with the same names but in different regions (eg. 17 villages all called "Akpınar"), same problem as was found with Polish placenames (also: why is Akçakoca failing to be caught?) The bot code finds 3000+ multi-matches for Turkey.
      • Also, this is due to non-standard naming conventions for the hierarchy of Turkish article categories: see, for example Category:Ankara Province.
      • I've now used spatial disambiguation to resolve some 2000+ of these.

Total is over 27,000 possibles: even doing a fraction of these would make a big dent in the backlog.

Tools of interest[edit]

Edit filters[edit]

Working effectively[edit]

Needing review[edit]

Of interest[edit]

Character blacklists[edit]

Work in progress:

[\x{1D400}-\x{1D7FF}]  # characters from Unicode block	Mathematical Alphanumeric Symbols
[\x{2100}-\x{214F}]    # characters from Unicode block Letterlike Symbols
[\x{2460}-\x{24FF}]    # characters from Unicode block Enclosed Alphanumerics
[\x{1F100}-\x{1F1FF}]  # characters from Unicode block Enclosed Alphanumeric Supplement
[\x{FF00}-\x{FFEF}]    # characters from Unicode block Fullwidth and Halfwidth Forms
[\x{2580}-\x{259F}]    # characters from Unicode block Block Elements
[\x{2500}-\x{257F}]    # characters from Unicode block Box Drawing
[\x{1D00}-\x{1D7F}]    # characters from Unicode block Phonetic Extensions
[\x{0250}-\x{02AF}]    # characters from Unicode block IPA Extensions

See this diff for some usernames using these characters, and this diff for adding these to AmandaNP's bot. See also meta:Talk:Title blacklist for global discussion.

Literal patterns[edit]

Also this diff for addition of some of these characters as literal matches.


-- now in Special:AbuseFilter/1168.


-- looks like the Kelvin symbol got normalized to a letter "K", so removed it, ditto the Angstrom sign which also got normalized, and now the ohm sign, which gets normalized to a capital omega


-- and these from the phonetic extensions range, which are AFAIK not used in any natural language


  1. ^ Dickson, E. J.; Dickson, E. J. (2021-07-09). "Dave Lampert, Inventor of the Sybian Sex Aid, Dead at 90". Rolling Stone. Retrieved 2021-07-10.
  2. ^ "30 Years Of Orgasms: Why The Sybian Remains The Cadillac Of Sex Toys". Vocativ. 2017-05-24. Retrieved 2021-07-10.
  3. ^ Blue, Marabelle (2021-07-08). "Sybian Founder, Creator Dave Lampert Passes Away". kinkemagazine. Retrieved 2021-07-10.
  4. ^ Home, Roux-Hinds Funeral. "Obituary for David L. Lampert | Roux-Hinds Funeral Home". Obituary for David L. Lampert | Roux-Hinds Funeral Home. Retrieved 2021-07-10.