New to bots on Wikipedia? Read these primers!
Operator: 0xDeadbeef (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 01:48, Thursday, May 5, 2022 (UTC)
Function overview: Removes tracker tags in Twitter links.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python
Source code available: gist
Links to relevant discussions (where appropriate):
Edit period(s): One time run
Estimated number of pages affected: <3000 per this query
Namespace(s): Mainspace
Exclusion compliant (Yes/No): Yes
Function details: Finds twitter.com URLs and remove parameters named as s
, t
, or cxt
.
Comments before task change
|
---|
|
https://twitter\.com/\w+/status/\d+\?[^\s}<|]+
is used to match the URL, and then urllib is used to parse, and then remove the parameters. 0xDeadbeef (T C) 15:19, 14 May 2022 (UTC)[reply]
https:\/\/twitter\.com\/\w+\/status\/\d+\?[^\s}<|]+
for regex, to escape the /
characters. (Same for below). Headbomb {t · c · p · b} 01:13, 17 May 2022 (UTC)[reply]
.
and \.
have different meanings in regex. 0xDeadbeef (T C) 02:30, 17 May 2022 (UTC)[reply]
/
needs to be escaped, but in python RegEx is just given as a string ' . . . ' ― Qwerfjkltalk 14:22, 29 May 2022 (UTC)[reply][^/]
or [\s=>]
for it to be primary. 0xDeadbeef (T C) 02:07, 15 May 2022 (UTC)[reply]
https://www.webcitation.org/6d0sXMyOT?url=https://twitter.com
.. couple others use ?url=
vs. "/" as the break point. -- GreenC 03:12, 15 May 2022 (UTC)[reply]
{{Foo|1=https://twitter.com}}
https://www.webcitation.org/6d0sXMyOT?url=https://twitter.com
0xDeadbeef (T C) 04:03, 15 May 2022 (UTC)[reply]
(?<!\?url=|/|cache:)https://twitter\.com/\w+/status/\d+/?\?[^\s}<|]+
0xDeadbeef (T C) 04:25, 16 May 2022 (UTC)[reply]
{{cite web |url=//twitter.com}}
. They are so uncommon and can be tricky it would probably be OK to skip or log them if it doesn't fit with the regex. -- GreenC 05:21, 16 May 2022 (UTC)[reply]
insource:
. 0xDeadbeef (T C) 04:23, 21 May 2022 (UTC)[reply]