Japan Search RDF Model Primer (unofficial)
Japan Search is a national integrated cross-sectoral portal, with collection of metadata from local and national libraries, museums, archives to anime and broadcasting programs. It will be the starting point to search variety of creative content in Japan.
Japan Search data model
Japan Search employs a hybrid databases: Elastic search engine and RDF store. While the former provides fast search and some pre-set tools on aggregated data, the latter adds normalized data for sophisticated SPARQL query.
Japan Search's RDF model (JPS RDF) is comprised of two major parts: (1) content description with normalized data, and (2) metadata about original (aggregated) data and contents access. The normalized part (1) has a set of simple schema.org descriptions, and corresponding structured descriptions.
Simple schema.org description
Because Japan Search aggregates variety of data from many sources, the JPS RDF normalizes data based on when (year base), where (prefecture or country base), who and what for easy access. Those values are described with simple schema.org properties e.g. schema:temporal
, schema:spatial
, schema:creator
(or schema:contributor
) and schema:about
.
For example, "a work related to Okayama prefecture" is expressed as simple triple in Fig.1 (place:岡山
is the QName of normalized URI for Okayama prefecture).
In order to make the use of those data easier for non Japanese users, normalized resources in this model have English labels as schema:name
with language tag "en" (Fig.2).
rdfs:label
) and a type. A query to find 陶磁 (ceramic) from Okayama prefecture looks like this:
A type URI has also an English schema:name
(See available classes, labels and super classes). Some records present English schema:name
as well (depends on the data provider). Therefore, the following query will get the same results as above, with optional English names (Do not forget to add language tags).
Note that the FILTER
uses langMatches
rather than simple lang
, because JPS RDF tries to add English labels from Japanese Kana transcriptions if available (and if there is no English labels), and those machine generated English names are tagged as "en-jp" (langMatches
appears to be quite heavy, we will instead use lang(?name)="en" || lang(?name)="en-jp" from here on).
Other simple schema.org properties will show you additional information on those items. Here we add schema:temporal
and schema:creator
as OPTIONAL
since "where", "when", "who" are not necessarily present all on one record.
The result will show you how "when" and "who" information are organized: Temporal information is normalized in time:
namespace, with the year as its local name (eg. time:1619
). Time range is expressed as e.g. time:1500-1599
(which is equivalent to 16th century). A person (agent) is identified in the namespace chname:
if normalized (cultural heritage name), and ncname:
if not (non controlled name), with Japanese name as its local part. IRIs in chname:
namespace have also English schema:name
(see nomalized name index).
(Note the remaining examples omit PREFIX
clauses for simplicity. They should work without them in this environment)
Structured description with JPS RDF properties
JPS RDF defines several properties to express structured description (we use jps:
prefix for them). Structured description amends simple schema.org description, with jps:relationType
property to refine the relationship, and connected to the value of corresponding simple description with jps:value
.
Since schema:spatial
relates the place related to the subject, it could include those created at the place as well as those exhibited there. You can refine the query with this structured property to find "Ceramics created at Okayama".
The structured property for "who" is jps:agential
, which corresponds to schema:creator
, schema:contributor
and schema:publisher
, as well as schema:about
(about whom). You will be able to find all resources related to Katsushika Hokusai (葛飾北斎) with a single query.
Note that some ?role
values in the results set have refinement descriptors e.g 制作.作画
(Creation--Drawing / Painting) depending on aggregated data. Those will be discussed later in this document.
Access information and source information
The second part of JPS RDF model mentioned above is information to access the content, and metadata about the provided source (aggregated) data, described by jps:accessInfo
and jps:sourceInfo
respectively. The providers of access and source are expressed using schema:provider
.
Fig.4 shows that the ceramic created at place:岡山
(Okayama) is exhibited at chname:九州国立博物館
(Kyushu National Museum), and its source data is provided by chname:ColBase
(Integrated Collections Database of the National Museums, Japan). (The actual data also describes the museum as spatial
information)
With this structure, you can query "ceramics accessible at Kyushu National Museum".
Access information also provides URLs of the content image, landing page etc., as well as other finding aides e.g. local catalogue number or geographical information. Source information has jps:sourceData
property which points to JSON data converted from original source data from the provider.
When, Where, Who, What
Here we will discuss some sophisticated queries using JPS RDF structure.
When: Time interval and Era
In JPS RDF, temporal information is normalized by year, and has structured data e.g. time interval.
To find works in year 1192, use time resource URI time:1192
as the object of query triple pattern.
While some works might have explicit year values, many would have century or era as temporal information. In this case, for example works created in 12th century, the time resource has a URI with year range as its local name, e.g. time:1100-1199
.
Although the above query can retrieve items whose temporal values are 12th century, the results set does not contain explicit time:1192
. In order to capture both cases, an interval search is required.
All time resources in JPS RDF have start and end year, i.e. they represent time intervals (Even explicit year resources such as time:1192
have both, whose values are the same) .
time:1192 jps:start 1192; jps:end 1192 . time:1100-1199 jps:start 1100; jps:end 1199 .
With this structure in mind, the query to retrieve all items created in (or related to) 12th century will be as follows
Era is also defined as time interval, identified by a URI with its Japanese label as local name, e.g. time:鎌倉時代
(Kamakura Period). Because many historical items have both era and century for their temporal description, JPS RDF introduces jps:era
property for the era value. If a resource has only era description, both jps:era
and jps:value
(hence schema:temporal
) have the same era value.
The following query will find works in Kamakura period.
Where: Location and Geohash
Location or spatial information is normalized by prefecture (or country if not in Japan). If more precise description e.g. city or street is provided, the original value is preserved in schema:description
of the structured node.
When possible, JPS RDF normalizer estimates lat/long for the location and adds schema:geo
to the structured node. Also, it calculates Geohash from the lat/long values, and generates a URI with 4-6 digits geohash, depending on lat/long accuracy. This URI is a value of jps:within
property.
For example, a record of a broadcast program filmed at Uji City, Kyoto, has lat/long values of the City (not accurate location), and the calculated Geohash URI.
<https://jpsearch.go.jp/data/michi-D0004160010_00000> jps:spatial [ schema:geo [ schema:latitude 34.8829 ; schema:longitude 135.79915 ; jps:note "京都府宇治市の緯度経度" #lat/long of Uji City, Kyoto ] ; jps:within <http://geohash.org/xn0w6> ; schema:description "撮影地住所: 宇治市宇治蓮華" #filmed location: Uji Renge, Uji City ]
The graph of the structured node is shown at Fig.7
Geohash value represents an area, and with more digits the area becomes narrower (more accurate). In general, 4-6 digits corresponds areas of ±20, ±2.4, ±0.61 km square. And if two Geohashes share the same prefix, they are located within the same area represented by the prefix hash.
Geohash URIs provide a handy way to find nearby contents.
Each Geohash URI is related to one digit smaller URI with jps:within
. Using SPARQL 1.1 property path operator (+
), one query can retrieve items in wider area.
Note the above Geohash has only 4 digits, and +
operator is appended to jps:within
.
Who: JPS normalized name and LOD
As explained above, schema:name
label can be used for English query. It works well for temporal and spatial values, however, not very well for person names, because Latin transcription of Japanese names are not uniform: for example, "小野道風" can be transcribed as "Ono Tofu", "Ono no Toufuu", "Ono no Tōfū" or "Ono Tôfû", and even worse, it is also read as "Ono Michikaze". JPS normalized names are related to external LOD authorities such as NDLA, DBpedia Japan and Wikidata with owl:sameAs
when possible. If you know Wikidata ID for "小野道風" is Q1439775, then you can make a query in a more confident manner.
What: Class hierarchy
In talking on "what", it could mean either "what type is the content" or "what is the content about". The former is expressed by rdf:type
, and the latter is by schema:about
.
Type query is simple. If your want to get Paintings (=type:絵画, related to, say, Kanagawa=place:神奈川), it will be as follows.
However, type description of each item is based on the provider's data, the same Hokusai's work could be "Painting" in one data while "Print" in another. The types in JPS RDF is defined as a class tree so that you could use subclass relationship to query such case.
You will see some works have class type:絵画
(Painting), while others have type:版画
(Print) or type:水彩
(Water color)
The values of schema:about
are generated from several fields (subject, genre, keyword...) of provided data. If the data has a URI of controlled vocabulary, JPS RDF uses the URI itself. If not, JPS RDF assigns it a keyword:
namespace with the value as local name. Although keyword:
namespace is not controlled and may have name conflicts, some frequently used terms would be useful for general search. And those terms, including many of controlled ones, have schema:name
English labels, too.
Some More Hints
Here we introduce some more practical hints to construct useful queries.
Role hierarchy
The values of jps:relationType
(explained in previous section) are generated from field labels of original data (publication place, author, etc. though in Japanese) or suffix in data (Alice, ed., Bob, trans.). While those values are normalized when possible, there remains a lot of variants.
In order to make it easier to use in query, JPS RDF organizes those values in 9 groups, namely role:制作
(Creation), role:公開
(Made public), role:発見
(Discovery), role:取得
(Acquisition), role:記録
(Record), role:出演
(Appearance), role:内容
(Content), role:支援
(Support) and role:関連
(Related), and each individual role taken from data is assigned as qualifier of those basic roles, e.g. role:制作.作曲
(Creation -- Composition) or role:公開.出版
(Made public -- Publication). In simple cases, the value could be non-qualified basic URI.
Each qualified role URI and its basic URI are related with skos:broader
. This is useful to find contents created in, say, Okayama.
Since the roles have only two tiers hierarchy, you just need to add zero or one unary operator (?
) to skos:broader
.
Images and IIIF Manifests
Thumbnails are described with schema:image
.
If high resolution images or other media objects (e.g. video, 3D, etc.) are provided in viewing systems, they will be described as a schema:url
of the access information. And if it is a IIIF Manifest, the resource URI has a type sc:Manifest
. The following query will list those IIIF Manifests.
Federated Query
Japan Search SPARQL endpoint provides (Federated Query) functionality. Together with normalized name to LOD relation, you can cross walk Japan Search and Europeana, for example.
The following query will retrieve the union of results from Europeana (federated) and Japan Search for works by 喜多川歌麿 (Kitagawa, Utamaro). DBpedia URI in JPS RDF can be specified by rdfs:isDefinedBy
. Some of Europeana data have DBpedia names in Europeana's Proxy and image information in data provider's Proxy.
(Note we need to declare some namespaces for Europeana, which are not predefined in this environment)
EasySPARQL
EasySPARQL is a simple interface to use SPARQL, where the Japan Search endpoint accepts REST parameters, constructs a query on behalf of a user, and returns a SPARQL results set.
Request parameters in EasySPARQL
parameter | value | note |
---|---|---|
when | YYYY[,YYYY] | era name (J) | YYYY is a year (,end year) |
where | Prefecture name (J) | lat,long | lat, long in decimal form |
who | normalized name | with prefix ~ , labels including English is seached (though only exact match) |
what | class name (J) | with prefix ~ , super classes included |
keyword | subject/keyword | with prefix = , exact match |
title | string in rdfs:label or schema:name | with prefix = , exact match for rdfs:label |
text | any text | seach all literal values |
format | json | xml | return format (default depends on Accept header) |
limit | max returns | default is system dependent |
(J) means only Japanese values are meaningful. Parameters can be mixed. For example, to search contents related to 葛飾北斎, use the following request. Note the prefix ~
and %2C+
(", ") to delimit family and given name.
https://jpsearch.go.jp/rdf/sparql/easy/?who
=~
Katsushika%2C+
Hokusai
Return values from EasySPARQL
The results set will be returned in SPARQL result format specified in format
parameter (JSON result format or XML result format. The following variables will be present in the results.
var | value |
---|---|
s | URI of the resource |
label | literal value of rdfs:label |
creator | URI of the value of schema:creator |
type | class URI (value of rdf:type ) |
Also the following variables may be present depending on request.
request | var | value |
---|---|---|
when | when | URI of year/era (value of schema:temporal ) |
where | lat long | numbers of latitude and longitude |
who | - | (no additinal variable) |
what | - | (no additinal variable) |
keyword | class | URI of keyword/subject (value of schema:about ) |
title | - | (no additinal variable) |
text | - | (no additinal variable) |
Consult each specification for the detail of the format.