Schema and guidelines for creating a staticSearch engine for your HTML5 site
Martin Holmes
Joey Takeda
2019-2024
This documentation provides instructions on how to use the Project Endings staticSearch
Generator to provide a fully-functional search ‘engine’ to your website without any
dependency on server-side code such as a database.
A bug which caused a VERSION file to be created containing a git error message if
the user chose to customize the test suite instead of creating their own configuration
has been fixed. A random version identifier is created instead.
10.3 Changes in version 1.4.7
Bug fixes:
A bug which caused a tokenization error because the XSLT processor was using an old
Unicode dataset has been handled with a temporary workaround, adding the codepoint
U+A78F to the tokenization mechanism.
When searching for a phrase containing an ampersand, the phrase would be successfully
found, but if the same search was reloaded from the browser URL, it would fail. This
has now been corrected. Thanks to the TMP team for reporting this.
10.4 Changes in version 1.4.6
Bug fixes:
A bug which caused weighting values to be lost during tokenizing has been corrected.
The mechanism which scrolls the search page automatically to display search results
has been adjusted in an attempt to avoid hiding the first result(s) behind a sticky
or fixed page header.
The build process has been modified in an attempt to ensure that the search page which
is generated contains one and only one charset declaration.
A warning is now issued when no title is found for a document being indexed. This
situation means that any search results will be less useful to the end user because
no document title can be displayed for the document.
10.5 Changes in version 1.4.5
Bug fixes:
Two configuration parameters, <stopwordsFile> and <dictionaryFile>, were previously documented as optional, but in fact the build process requires them.
The documentation has been updated.
There was a mismatch between files which were included in the indexing process and
files which were checked for well-formedness ahead of indexing; it was possible for
files that were not intended to be indexed at all to be checked and fail the well-formedness
test even though they were not part of the build. This has been fixed.
10.6 Changes in version 1.4.4
Bug fix:
A deprecated JavaScript function, substr(), which was used in one line of code only, has been replaced by substring().
10.7 Changes in version 1.4.3
New feature:
A new method has been added to the StaticSearch class: preProcessSearchString(), which is called from parseSearchQuery(), does nothing as it is defined, but can be overridden by end-users to enable them
to do any special processing they might need to do on an input search string. This
is sometimes necessary when (for example) specific diacritics are being ignored by
the search indexing, but end-users might type them in anyway. The search string is
also being unicode-normalized (to NFC) at the beginning of the parsing function.
10.8 Changes in version 1.4.2
Bug fix:
A bug which could cause a divide-by-zero error in the JSON generation stage of the
build when the target site consisted of fewer than ten documents was fixed via a pull
request from Norman Walsh. Thanks Norman!
10.9 Changes in version 1.4.1
Bug fix:
A minor bug whereby a rare combination of circumstances could lead to document hit
scores being reported as concatenated numbers rather than summed numbers has been
fixed.
10.10 Changes in version 1.4
Deprecations requiring changes to existing projects:
Filter sort keys must be declared using the all lower-cased data attribute data-ssfiltersortkey. While the documentation in 7.1.1.1 Sort order for description filters correctly specified the attribute's name, the processing code only accepted the camel-case
version, which is invalid XHTML5. In 1.4, using data-ssFilterSortKey will result in WARNINGs; in all subsequent versions, using the camel-case attribute
name will result in build failures.
New features and enhancements:
A new Search only in feature has been added. This enables you to specify regions of documents and label
them using the label attribute on <context> (per issue #20). Users can then check only the regions they would like to get search results from.
A new feature filter has been added to the collection of search facets. This provides an option for cases
where the number of items in a description filter might be so large that providing
a long list of individual checkboxes for all the options is not practical. Instead,
the feature filter offers a method of finding and selecting items using a typeahead
control.
When you navigate back or forward to a previous search, if any of the filters which
were used in that search are hidden inside closed HTML details elements, those elements
will be opened.
A noscript element is now inserted into the search page to handle cases where JavaScript
is turned off in the user's browser.
A ‘Loading...’ splash screen is shown when the search page is initially configuring
itself.
All inline CSS and JavaScript has now been moved to external files, to better suit
Content Security Policy constraints.
The staticSearch report has been simplified and no longer produces a concordance of
stems by default. The concordance can be built at the command line by calling the
concordance target in ant:
The version attribute has been added to the root <config> element to better future-proof the alignment of configuration files and the staticSearch
codebase. See 7.5.1 The config element for more details.
Bug fixes:
A bug which caused number filters to be ignored when navigating back to a previous
search has been fixed.
A bug which caused the tokenizer to assume wordbreaks when encountering certain diacritics
has been fixed.
All issues and tickets related to version 1.4 can be found on GitHub.
10.11 Changes in version 1.3
Note that version 1.2 was withdrawn in favour of version 1.3, so the list below includes
changes from the original version 1.2 and the current 1.3.
Deprecations requiring changes to existing projects:
All staticSearch classes with periods have been deprecated in favour of underscores
as periods in class names conflict with the standard "." chaining selector in CSS
and JavaScript. (See issue 149 for the full discussion.) This affects the majority of staticSearch meta classes, which should be changed from
staticSearch. to staticSearch_; see the full list below:
DEPRECATED VALUE
REPLACE WITH
staticSearch.desc
staticSearch_desc
staticSearch.bool
staticSearch_bool
staticSearch.num
staticSearch_num
staticSearch.date
staticSearch_date
staticSearch.docTitle
staticSearch_docTitle
staticSearch.docImage
staticSearch_docImage
staticSearch.docSortKey
staticSearch_docSortKey
For version 1.3, using the deprecated period syntax will result in WARNINGs; in all
subsequent versions, using the period syntax will result in build failures.
The original parameters config and configFile have been renamed ssConfig and ssConfigFile to minimize the chances of naming collisions with parameters in other build processes.
IF YOU HAVE SCRIPTED A staticSearch BUILD AS PART OF YOUR OWN BUILD PROCESS, YOU WILL
NEED TO UPDATE THESE PARAMETER NAMES.
New features and enhancements:
The JavaScript source code has now been split into several distinct source files,
and is compiled and optimized using the Google Closure Compiler at build time. See
JavaScript compilation for more information.
Support for French in captions etc. in the search page has been improved.
A French stemmer is now available, as well as a caption set for French search pages.
Images can now be configured for specific parts of a document, as well as for the
whole document, for display alongside KWICs in results. This is part of a new extension
mechanism using custom attributes.
The CSS for the search page inserted by the indexing process is now more easily accessible
in a separate file css/ssSearch.css, which is linked into the search page at build time.
Links to target documents from keyword-in-context results now include a search string
parameter that specifies the hit text, so that JavaScript running in the target page
can highlight the search hit(s) and scroll to them. See Highlighting search hits on target pages for more information.
Documentation has been significantly improved with additional explanatory remarks
for many elements, and the staticSearch build of the documentation now includes hit
highlighting (the feature described above).
Only ancestor ids are indexed when <linkToFragmentId> is enabled; formerly, any preceding id value was used.
Results can now optionally be viewed in batches by setting the new <resultsPerPage> configuration option.
The maximum number of results that a search can return has been set to 2000 results
by default and can be changed using the new <resultsLimit> element. If a search returns a set of results that exceeds this limit, staticSearch
does not render the results and advises the user to try a more precise search.
The minimum length of a word to be indexed is now configurable, so in unusual circumstances
you can now enable searching for 1- or 2-letter words using the <minWordLength> parameter.
The staticSearch report is now discussed in the documentation (see 7.9 Generated report) and the "Not in Dictionary" and "Foreign Words" reports have been improved.
The filter creation process has been rationalized such that all filter processing
happens in json.xsl, which has also improved the build performance slightly.
Bug fixes:
The encoding structure for docImage, docSortKey, and docTitle has been constrained
such that each doc* <meta> must include both a name and class value:
Temporary XML files from dictionaries are now removed during the clean step of the build process.
All HTML characters are properly escaped in context snippets.
10.12 Changes in version 1.1
New features and enhancements:
Search results can now be sorted using a user-supplied sort key. This is useful when searching only with filters (so all documents have the same
relevance score) or where many results have the same relevance score.
Using the new <linkToFragmentId> parameter, keyword-in-context results can now have individual links to nearest ancestor
fragment id, so the searcher can go directly to the relevant section of a document.
The order of parameter elements in the configuration file is no longer fixed. The
schema now allows elements to appear in any order.
We have added a new How Do I... section in the documentation.
Phrasal searches are now case-sensitive, meaning that you can use a ‘phrasal search’
to search for proper names, by putting quotation marks around them and making the
first letter upper-case.
Indexing is faster and the size of the index is smaller because we have eliminated
the upper-case index.