Japan Search RDF Model Primer (unofficial)

Japan Search is a national integrated cross-sectoral portal, with collection of metadata from local and national libraries, museums, archives to anime and broadcasting programs. It will be the starting point to search variety of creative content in Japan.

Japan Search data model
When, Where, Who, What
Some More Hints
EasySPARQL

Japan Search data model

Japan Search employs a hybrid databases: Elastic search engine and RDF store. While the former provides fast search and some pre-set tools on aggregated data, the latter adds normalized data for sophisticated SPARQL query.

Japan Search's RDF model (JPS RDF) is comprised of two major parts: (1) content description with normalized data, and (2) metadata about original (aggregated) data and contents access. The normalized part (1) has a set of simple schema.org descriptions, and corresponding structured descriptions.

Simple schema.org description

Because Japan Search aggregates variety of data from many sources, the JPS RDF normalizes data based on when (year base), where (prefecture or country base), who and what for easy access. Those values are described with simple schema.org properties e.g. schema:temporal, schema:spatial, schema:creator (or schema:contributor) and schema:about.

For example, "a work related to Okayama prefecture" is expressed as simple triple in Fig.1 (place:岡山 is the QName of normalized URI for Okayama prefecture).

In order to make the use of those data easier for non Japanese users, normalized resources in this model have English labels as schema:name with language tag "en" (Fig.2).

Each record (an item describing a work) in Japan Search has a label (rdfs:label) and a type. A query to find 陶磁 (ceramic) from Okayama prefecture looks like this:

A type URI has also an English schema:name (See available classes, labels and super classes). Some records present English schema:name as well (depends on the data provider). Therefore, the following query will get the same results as above, with optional English names (Do not forget to add language tags).

Note that the FILTER uses langMatches rather than simple lang, because JPS RDF tries to add English labels from Japanese Kana transcriptions if available (and if there is no English labels), and those machine generated English names are tagged as "en-jp" (langMatches appears to be quite heavy, we will instead use lang(?name)="en" || lang(?name)="en-jp" from here on).

Other simple schema.org properties will show you additional information on those items. Here we add schema:temporal and schema:creator as OPTIONAL since "where", "when", "who" are not necessarily present all on one record.

The result will show you how "when" and "who" information are organized: Temporal information is normalized in time: namespace, with the year as its local name (eg. time:1619). Time range is expressed as e.g. time:1500-1599 (which is equivalent to 16th century). A person (agent) is identified in the namespace chname: if normalized (cultural heritage name), and ncname: if not (non controlled name), with Japanese name as its local part. IRIs in chname: namespace have also English schema:name (see nomalized name index).

(Note the remaining examples omit PREFIX clauses for simplicity. They should work without them in this environment)

Structured description with JPS RDF properties

JPS RDF defines several properties to express structured description (we use jps: prefix for them). Structured description amends simple schema.org description, with jps:relationType property to refine the relationship, and connected to the value of corresponding simple description with jps:value.

Since schema:spatial relates the place related to the subject, it could include those created at the place as well as those exhibited there. You can refine the query with this structured property to find "Ceramics created at Okayama".

The structured property for "who" is jps:agential, which corresponds to schema:creator, schema:contributor and schema:publisher, as well as schema:about (about whom). You will be able to find all resources related to Katsushika Hokusai (葛飾北斎) with a single query.

Note that some ?role values in the results set have refinement descriptors e.g 制作.作画 (Creation--Drawing / Painting) depending on aggregated data. Those will be discussed later in this document.

Access information and source information

The second part of JPS RDF model mentioned above is information to access the content, and metadata about the provided source (aggregated) data, described by jps:accessInfo and jps:sourceInfo respectively. The providers of access and source are expressed using schema:provider.

Fig.4 shows that the ceramic created at place:岡山 (Okayama) is exhibited at chname:九州国立博物館 (Kyushu National Museum), and its source data is provided by chname:ColBase (Integrated Collections Database of the National Museums, Japan). (The actual data also describes the museum as spatial information)

With this structure, you can query "ceramics accessible at Kyushu National Museum".

Access information also provides URLs of the content image, landing page etc., as well as other finding aides e.g. local catalogue number or geographical information. Source information has jps:sourceData property which points to JSON data converted from original source data from the provider.

When, Where, Who, What

Here we will discuss some sophisticated queries using JPS RDF structure.

When: Time interval and Era

In JPS RDF, temporal information is normalized by year, and has structured data e.g. time interval.

To find works in year 1192, use time resource URI time:1192 as the object of query triple pattern.

While some works might have explicit year values, many would have century or era as temporal information. In this case, for example works created in 12th century, the time resource has a URI with year range as its local name, e.g. time:1100-1199 .

Although the above query can retrieve items whose temporal values are 12th century, the results set does not contain explicit time:1192. In order to capture both cases, an interval search is required.

All time resources in JPS RDF have start and end year, i.e. they represent time intervals (Even explicit year resources such as time:1192 have both, whose values are the same) .

time:1192 jps:start 1192; jps:end 1192 .
time:1100-1199 jps:start 1100; jps:end 1199 .

With this structure in mind, the query to retrieve all items created in (or related to) 12th century will be as follows

Era is also defined as time interval, identified by a URI with its Japanese label as local name, e.g. time:鎌倉時代 (Kamakura Period). Because many historical items have both era and century for their temporal description, JPS RDF introduces jps:era property for the era value. If a resource has only era description, both jps:era and jps:value (hence schema:temporal) have the same era value.

The following query will find works in Kamakura period.

Where: Location and Geohash

Location or spatial information is normalized by prefecture (or country if not in Japan). If more precise description e.g. city or street is provided, the original value is preserved in schema:description of the structured node.

When possible, JPS RDF normalizer estimates lat/long for the location and adds schema:geo to the structured node. Also, it calculates Geohash from the lat/long values, and generates a URI with 4-6 digits geohash, depending on lat/long accuracy. This URI is a value of jps:within property.

For example, a record of a broadcast program filmed at Uji City, Kyoto, has lat/long values of the City (not accurate location), and the calculated Geohash URI.

<https://jpsearch.go.jp/data/michi-D0004160010_00000>
    jps:spatial [
        schema:geo [
            schema:latitude 34.8829 ;
            schema:longitude 135.79915 ;
            jps:note "京都府宇治市の緯度経度"      #lat/long of Uji City, Kyoto
        ] ;
        jps:within <http://geohash.org/xn0w6> ;
        schema:description "撮影地住所: 宇治市宇治蓮華" #filmed location: Uji Renge, Uji City
    ]

The graph of the structured node is shown at Fig.7

Geohash value represents an area, and with more digits the area becomes narrower (more accurate). In general, 4-6 digits corresponds areas of ±20, ±2.4, ±0.61 km square. And if two Geohashes share the same prefix, they are located within the same area represented by the prefix hash.

Geohash URIs provide a handy way to find nearby contents.

Each Geohash URI is related to one digit smaller URI with jps:within. Using SPARQL 1.1 property path operator (+), one query can retrieve items in wider area.

Note the above Geohash has only 4 digits, and + operator is appended to jps:within.

Who: JPS normalized name and LOD

As explained above, schema:name label can be used for English query. It works well for temporal and spatial values, however, not very well for person names, because Latin transcription of Japanese names are not uniform: for example, "小野道風" can be transcribed as "Ono Tofu", "Ono no Toufuu", "Ono no Tōfū" or "Ono Tôfû", and even worse, it is also read as "Ono Michikaze". JPS normalized names are related to external LOD authorities such as NDLA, DBpedia Japan and Wikidata with owl:sameAs when possible. If you know Wikidata ID for "小野道風" is Q1439775, then you can make a query in a more confident manner.

What: Class hierarchy

In talking on "what", it could mean either "what type is the content" or "what is the content about". The former is expressed by rdf:type, and the latter is by schema:about.

Type query is simple. If your want to get Paintings (=type:絵画, related to, say, Kanagawa=place:神奈川), it will be as follows.

However, type description of each item is based on the provider's data, the same Hokusai's work could be "Painting" in one data while "Print" in another. The types in JPS RDF is defined as a class tree so that you could use subclass relationship to query such case.

You will see some works have class type:絵画 (Painting), while others have type:版画 (Print) or type:水彩 (Water color)

The values of schema:about are generated from several fields (subject, genre, keyword...) of provided data. If the data has a URI of controlled vocabulary, JPS RDF uses the URI itself. If not, JPS RDF assigns it a keyword: namespace with the value as local name. Although keyword: namespace is not controlled and may have name conflicts, some frequently used terms would be useful for general search. And those terms, including many of controlled ones, have schema:name English labels, too.

Some More Hints

Here we introduce some more practical hints to construct useful queries.

Role hierarchy

The values of jps:relationType (explained in previous section) are generated from field labels of original data (publication place, author, etc. though in Japanese) or suffix in data (Alice, ed., Bob, trans.). While those values are normalized when possible, there remains a lot of variants.

In order to make it easier to use in query, JPS RDF organizes those values in 9 groups, namely role:制作 (Creation), role:公開 (Made public), role:発見 (Discovery), role:取得 (Acquisition), role:記録 (Record), role:出演 (Appearance), role:内容 (Content), role:支援 (Support) and role:関連 (Related), and each individual role taken from data is assigned as qualifier of those basic roles, e.g. role:制作.作曲 (Creation -- Composition) or role:公開.出版 (Made public -- Publication). In simple cases, the value could be non-qualified basic URI.

Each qualified role URI and its basic URI are related with skos:broader. This is useful to find contents created in, say, Okayama.

Since the roles have only two tiers hierarchy, you just need to add zero or one unary operator (?) to skos:broader.

Images and IIIF Manifests

Thumbnails are described with schema:image.

If high resolution images or other media objects (e.g. video, 3D, etc.) are provided in viewing systems, they will be described as a schema:url of the access information. And if it is a IIIF Manifest, the resource URI has a type sc:Manifest. The following query will list those IIIF Manifests.

Federated Query

Japan Search SPARQL endpoint provides (Federated Query) functionality. Together with normalized name to LOD relation, you can cross walk Japan Search and Europeana, for example.

The following query will retrieve the union of results from Europeana (federated) and Japan Search for works by 喜多川歌麿 (Kitagawa, Utamaro). DBpedia URI in JPS RDF can be specified by rdfs:isDefinedBy. Some of Europeana data have DBpedia names in Europeana's Proxy and image information in data provider's Proxy.

(Note we need to declare some namespaces for Europeana, which are not predefined in this environment)

EasySPARQL

EasySPARQL is a simple interface to use SPARQL, where the Japan Search endpoint accepts REST parameters, constructs a query on behalf of a user, and returns a SPARQL results set.

Request parameters in EasySPARQL

EasySPARQL request parameters
parameter	value	note
`when`	YYYY[,YYYY] \| era name (J)	YYYY is a year (,end year)
`where`	Prefecture name (J) \| lat,long	lat, long in decimal form
`who`	normalized name	with prefix `~`, labels including English is seached (though only exact match)
`what`	class name (J)	with prefix `~`, super classes included
`keyword`	subject/keyword	with prefix `=`, exact match
`title`	string in rdfs:label or schema:name	with prefix `=`, exact match for rdfs:label
`text`	any text	seach all literal values
`format`	json \| xml	return format (default depends on Accept header)
`limit`	max returns	default is system dependent

(J) means only Japanese values are meaningful. Parameters can be mixed. For example, to search contents related to 葛飾北斎, use the following request. Note the prefix ~ and %2C+ (", ") to delimit family and given name.

https://jpsearch.go.jp/rdf/sparql/easy/?who=~Katsushika%2C+Hokusai

Return values from EasySPARQL

The results set will be returned in SPARQL result format specified in format parameter (JSON result format or XML result format. The following variables will be present in the results.

Common variables in EasySPARQL results set
var	value
`s`	URI of the resource
`label`	literal value of `rdfs:label`
`creator`	URI of the value of `schema:creator`
`type`	class URI (value of `rdf:type`)

Also the following variables may be present depending on request.

Request specific variables in EasySPARQL results set
request	var	value
`when`	`when`	URI of year/era (value of `schema:temporal`)
`where`	`lat` `long`	numbers of latitude and longitude
`who`	-	(no additinal variable)
`what`	-	(no additinal variable)
`keyword`	`class`	URI of keyword/subject (value of `schema:about`)
`title`	-	(no additinal variable)
`text`	-	(no additinal variable)

Consult each specification for the detail of the format.