the original query includes redirects which do not have extracts
btw boolean parameters like exintro and explaintext do not need a value - they are true if present, false if absent
looking at forum posts on the wikipedia search project it’s clear there’s a whole bunch of confusing examples - I see opensearch queries that refer to an obsolete search protocol - it was pretty much DOA - I see search used as a generator when it can be used directly as I showed above
the wikipedia api is quite powerful - it’s better to start as simple as possible then explore more complex queries
this is the place to start
https://www.mediawiki.org/wiki/API:Search
the search api has the srsearch
parameter for the search terms - this query searches for “music” in titles - it is the simplest and has good defaults - it includes a snippet of the match - it resolves redirects
action=query&list=search&srsearch=music&origin=*&format=json&formatversion=2&utf8
UTF-8 is the best encoding for non-English characters
JSON format version 2 is explained here https://www.mediawiki.org/wiki/API:JSON_version_2
origin=*
is needed for CORS explained here https://www.mediawiki.org/wiki/API:Cross-site_requests
page content can be searched by adding srwhat=text
action=query&list=search&srwhat=text&srsearch=music&origin=*&format=json&formatversion=2&utf8
The response looks like this
{
batchcomplete : true,
continue : {
sroffset : 10,
continue : "-||"
},
query : {
search : [
{
timestamp : "2017-08-28T22:23:08Z",
size : 139746,
snippet : "<span class=\"searchmatch\">Music</span> is an art form and cultural activity whose medium is sound organized in time. The common elements of <span class=\"searchmatch\">music</span> are pitch (which governs melody and harmony)",
pageid : 18839,
wordcount : 17243,
ns : 0,
title : "Music"
},
more array entries in between
{
snippet : "institution can also be known as a school of <span class=\"searchmatch\">music</span>, <span class=\"searchmatch\">music</span> academy, <span class=\"searchmatch\">music</span> faculty, college of <span class=\"searchmatch\">music</span>, <span class=\"searchmatch\">music</span> department (of a larger institution), conservatory",
ns : 0,
wordcount : 2320,
pageid : 24782280,
title : "Music school",
size : 18997,
timestamp : "2017-08-30T08:17:18Z"
}
],
searchinfo : {
totalhits : 734632
}
}
}
a page is retrieved by replacing spaces with underscores in the title - e.g. the entry above for “Music school” has the url
https://en.wikipedia.org/wiki/Music_school
or by its page id in a url
https://en.wikipedia.org?curid=24782280
or the page html can be retrieved through the parse api
action=parse&pageid=24782280&prop=text&origin=*&format=json&formatversion=2&utf8