Explanation

This page exists solely as a list of articles for processing to cleanup bare URLs (see WP:Bare URLs). I do that by feeding these lists to Citation bot.

In the vast majority of cases, these are articles in which I have no interest other than fixing bare URL references (BURs).

Note that the definition of "bare URL reference" (BUR) used here is narrow: ref tags which contain only the URL, optionally preceded or followed by spaces, and/or enclosed in square brackets [].

For example:

  • bare URL, with no spaces: <ref>https://www.example.com/foo</ref>
  • bare URL, with spaces: <ref> https://www.example2.com/foobar </ref>
  • bracketed URL, with no spaces: <ref>[https://www.example.com/foo]</ref>
  • bracketed URL, with spaces: <ref> [https://www.example2.com/foobar] </ref>

There are of course many other types of inadequately described citation. This exercise targets only the simplest, worst examples. However, when Citation bot processes a page, it can fix many other citation issues, so this exercise fixes more than just the targeted problem.

Methodology

These lists of pages with new bare URL refs are derived from the database dumps published twice a month at https://dumps.wikimedia.org/enwiki/.

The methodology is simple:

  1. use WP:AWB's database scanner to find all the articles with BURs (ABURs) in the most recent database dump, using the regex <ref[^>]*?>\s*https?:[^>< \|\[\]]+(?<!\.pdf)\s*<\s*/\s*ref.
    (Note that this regex ignores URLs which end with ".pdf". That is because citation bot cannot find titles or other metadata for PDFs.)
  2. use the same method to find all the ABURs in a previous database dump
  3. use AWB's list comparer tool to find the articles which are in the newer list, but not the older list
  4. use AWB's pre-parse mode to find only articles which:
    • still have one or more BURs. Checked by using the same regex as above; <ref[^>]*?>\s*https?:[^>< \|\[\]]+(?<!\.pdf)\s*<\s*/\s*ref

Scope

This list is based on a comparison of the 20220520 dump (99,631 ABURs) with the previous 20220501 dump (111,364 ABURs).

The comparison found:

  • 17,734 ABURs which are only in the older (20220501) dump
  • 93,630 ABURs which are in both old and new database dumps
  • 6,001 ABURS which are only in the newer (20220520) dump

After pre-parsing the list of 6,001 new ABURS to remove any where the refs had been filled since the database dump was made, there were 4,055 ABURs to be processed by Citation bot. The list was been split into two chunks for processing.

That first pass removed all the bare refs on 2,074 pages, just over half the set. The other 1,931 pages were processed again as "Take 2", because some URLs need multiple passes before they are filed.

After Take 2 was processed, 1,404 pages still had one or more bare links. Those pages are Take 3.

After Take 3 was processed, 1,249 pages still had one or more bare links. Those pages are Take 4.

After Take 4 was processed, 1,039 pages still had one or more bare links. Those pages are Take 5.

After Take 5 was processed, 887 pages still had one or more bare links. Those pages are Take 6.

Lists

Blanked, to prevent accidental re-submission.
The latest batch "New ABURs between 20220501 and 20220520, Take 6" (887 pages)
​is currently being processed by Citation bot, as of 10:31, 30 May 2022 (UTC)