GEOGRAPHIC ENTITY RETRIEVAL FOR FINDING PLACES

Journal of Data Intelligence, Vol. 3, No. 4 (2022) 401–420

 Rinton Press

GEOGRAPHIC ENTITY RETRIEVAL

FOR FINDING PLACES SUITABLE FOR CERTAIN PURPOSES

BY USING RELEVANCE GRAPHS ON PLACES AND REVIEWS

YUI MAEKAWA

Department of Computer Science, Tokyo Institute of Technology.

Meguro-ku, Tokyo 152 – 8550, Japan.

[email protected]

YOSHIYUKI SHOJI

College of Science and Engineering, Aoyama Gakuin University.

Sagamihara, Kanagawa 252-5258, Japan.

[email protected]

MARTIN J. D

URST

College of Science and Engineering, Aoyama Gakuin University.

Sagamihara, Kanagawa 252-5258, Japan.

[email protected]

This paper proposes a method of ranking geographic entities (places) where a purpose,

given as a query, can be achieved. Most existing map search engines accept only the

name of a place or the type of a place. Thus when searchers want to ﬁnd a suitable

place for “guitar practice”, they have to input a place type such as “music studio”. To

create such a query, prior knowledge (i.e., that a music studio is suitable for playing

guitar) is required. Our proposed method uses online review information on places to

enable direct place retrieval from a given purpose query. Our method creates a bipartite

graph consisting of places and the words that appear in the reviews of these places. The

relevance between the given keyword query and a place is calculated by using the Random

Walk with Restart algorithm. Additionally, we expand the graph with three hypotheses:

1) places that are suitable for the same purpose are similar to each other, and purposes

that can be achieved in the same place are similar to each other, 2) the same purpose can

be achieved in places with similar metadata, and 3) purposes which have semantically

similar meaning can be achieved in the same places. Through an experiment using

real review data taken from Google Maps, the usefulness of the proposed method was

demonstrated. In particular, experimental result shows that the expansion by places’

metadata is eﬀective for ﬁnding more relevant places.

Keywords: Place Search, Random Walk with Restart, Online Review

Yui Maekawa contributed to this research while at Aoyama Gakuin University until March 2020

401

402 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

1. Introduction

In recent years, geographic information retrieval is becoming more and more popular. A wide

range of people uses place search services (e.g., Google Maps, Bing Maps) to ﬁnd stores,

facilities, and other places. The users of place search include children who do not have suﬃ-

cient prior knowledge of places and elderly people who are not good at searching. Nowadays,

with such a wide range of people using geographic information retrieval services, there is a

growing demand for geographic information retrieval algorithms that allow users with little

prior knowledge of geographic information to search successfully.

Conventional geographic search systems only accept the name of a place or the type of a

place as a query. Therefore, in order to ﬁnd a place where a speciﬁc purpose can be achieved,

the user needs to enter the type of the venue, or characteristics of the place he or she wants

to ﬁnd. For example, if you want to buy a book, you must search for “bookstore”; if you

want to use a delivery service, you must enter the query “post oﬃce”. Let us imagine the

case that a user is looking for a place where he can achieve “guitar practice”. In this case,

normally, it is necessary to search with a place type query, such as “music studio”. However,

in order to create this query, users need prior knowledge, such as “we can practice guitar in a

music studio”. Therefore, it is impossible to input any facility where they can practice guitar

without that prior knowledge. This problem can be solved if we can search places by purpose,

using “guitar practice” as a query.

In addition, even if a searcher has prior knowledge that guitar can be practiced in a music

studio, the query “music studio” may not ﬁnd a large number of places where the searcher

can practice guitar. A music studio is not the only place where a guitar can be played. There

are many places where this is possible: parks, karaoke rooms, riversides, and so on. It is not

reasonable to list all these places in the search query.

In this research, we propose a new search algorithm that ranks places by the possibility

that they can achieve the purpose indicated by the query. For example, if the user enters

“guitar practice”, the system will rank speciﬁc places such as “Studio FOO Tokyo branch”,

“BAR Karaoke Tokyo branch”, or “Tokyo central park”. Our search algorithm aims to allow

the user to input a purpose, so that a wide range of users can search places more easily,

regardless of prior knowledge.

The reason for the eﬀectiveness of such a search model is that the search diﬃculty is

asymmetric. It is easy to determine if a place makes it possible to achieve a purpose by

accessing the oﬃcial Website or by calling the place. However, it is diﬃcult to make a list of

candidates. If the places with a high likelihood of achieving the purpose can be ranked, users

can ﬁnd a suitable place in very few steps.

In this research, we focused on online reviews about places to realize our search algorithm.

Some geographic information services, such as Google Maps, allow users to post reviews of

a certain place. Such reviews include many actual and feasible actions taken by users at the

place.

Although these online reviews taken from geographic information sites are an important

information resource, they are not suﬃcient to implement the proposed search algorithm

directly. One of the reasons is the limited comprehensiveness of the reviews. The review

information usually describes only some of the actions that can be performed at a place. For

instance, not all places where you can practice guitar have a review that says “I practiced my

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 403

guitar here”. Traditional information retrieval methods based on simple string matching can

therefore not take advantage of the reviews.

Therefore, we propose a graph-based algorithm that links given purpose queries and places,

by setting up the following three hypotheses:

H1 Mutual Recursive Deduction:

Places that are suitable for the same purpose are similar to each other, and purposes

that can be achieved in the same place are similar to each other.

H2 Expansion by Place Type:

The same purpose can be achieved in places with similar metadata. For instance, if you

were able to play guitar in a certain Karaoke room, there is a high probability that you

can play guitar in another Karaoke room.

H3 Expansion by Word Semantics:

Purposes that have semantically similar meanings can be achieved in the same places.

For instance, if a certain park gets a review saying “This place is suitable for playing

ukulele”, this park should also be suitable for playing guitar.

The proposed method performs Random Walk with Restart (RWR) link analysis on a bipartite

graph. This graph is composed of places and the words that appear in reviews for these

places. In order to clarify the eﬀectiveness of our method, an experiment using real data

was conducted. For the experiment, we implemented an actual place search system that uses

review data obtained from Google Maps. In this system, when a searcher inputs a purpose

as a query, they can obtain the ranking of places suitable for achieving that purpose. The

method’s accuracy was checked by performing actual searches with pre-prepared queries and

manually labeling the results.

This paper is an advanced version of the work presented at iiWAS2021 [14]. The structure

of this paper is as follows. In Section 2, we discuss existing research related to our method.

Section 3 describes the details of our search algorithm. In Section 4, the proposed method is

evaluated through an experiment. Section 5 discusses the experiment results, and Section 6

presents conclusions and future work.

2. Related Work

This research is part of the research on purpose-oriented search algorithms. We adopt a

graph approach, extend places by metadata, and extend purposes with synonyms. Therefore,

this research is closely related to the existing research of geographic information retrieval,

expansion of purpose, and locality recommendation.

2.1. Geographic Information Retrieval

Geographic information retrieval is a classic research topic in both the GIS (Geographic

Information System) and information retrieval ﬁelds. An evaluation competition called Geo-

CLEF [15] in the information retrieval ﬁeld was held several times, and many geographic

retrieval methods were proposed and evaluated in the workshops.

Following this kind of research, many studies on geographic information retrieval are

still being conducted. Jones et al. [10] organize geographic information retrieval from the

404 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

perspective of information retrieval and discuss query processing, ranking methods, and also

evaluation methods. Their survey points out the disambiguation of place names and the

diﬀerence between the human vocabulary and the vocabulary represented on maps as one

of the diﬃculties in this kind of geographic information retrieval. For example, for a search

request such as “near a park with lots of greenery”, the park’s oﬃcial name is not included

in the query, and the term “near” related to the proximity cannot be calculated by keyword

matching.

Major geographic information search systems typically accept place names, place types,

and addresses as queries for ﬁnding places. Therefore, a lot of research has been done on a

search to enable more ﬂexible input, such as expanding the query.

Pat et al. [16] developed a geographic information retrieval system that collects location

information (geotagged posts) from social networking sites such as Twitter and Instagram,

and represents the results in terms of territory. They attempted to make the normally static

geographic information database dynamic by focusing on geotagged posts on social networking

sites.

Hariharan et al. [7] deﬁned search requests that do not directly include the name of a

place as “Spatial-Keyword (SK) queries” and propose a method to actually answer them.

For example, in order to enable the processing of search requests such as “Find shelters with

emergency medical facilities in Orange County”, it is necessary to integrate other information

sources with GIS systems. They have proposed “Geographic Information Retrieval (GIR)

Systems” as a wrapper to handle multiple GIS systems, and implemented the framework.

Shoji et al. [19] also proposed a method using geotagged tweets for ﬁnding places. Their

method named “location2vec” is based on a word2vec-like algorithm, and it can ﬁnd similar

places by comparing tweets around diﬀerent places. However, since many users post their

tweets with automatic geotagging by the SNS (Social Networking Site) system, posts about

a place made after moving somewhere else have the wrong geotag. Therefore, the accuracy

of the information in geotagged posts is questionable. This research is similar to the present

studies because it also focuses on social data for geographic information retrieval. However,

we chose review information for a place instead of SNS posts, because compared with SNS

posts, there is a much higher likelihood of containing information related to the place.

Bauer et al. [4] analyzed oﬄine purchasing needs and proposed a search method for physical

brick-and-mortar stores where actual purchases can be made, while online mail-order sales are

now standard. This is accomplished by querying the keywords representing the object to be

purchased and vectorizing the locations, respectively, and ranking them by cosine similarity.

This research is similar to our research in that the search targets are actual objects. However,

our research does not use a simple similarity calculation in a vector space model, but a link

analysis on a two-part graph. The diﬀerence is that we aimed to widen the range of input

data in the search. As a result, we can ﬁnd not only places where people can buy something

from a retailer, but also other places.

Kato et al. [11] expanded the input of place search to allow examples as queries. In their

method, searchers can input a certain place, and the system ﬁnds similar places. It can help

ﬁnd places by purpose, but the searchers need to know an example place that is suitable for

their purpose.

Purves et al. [18] summarized many studies on GIRs, and described the importance of

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 405

information retrieval based on the needs of the actual domain for current search engines used

on mobile phones. In addition, they point to the application of machine learning techniques

as an essential issue in this research ﬁeld.

Following these studies, our approach proposes a new search methodology that enables the

system to accept “purpose” as its input. We believe that this research is novel and important

as a search applications that can respond to various information requirements, not only the

names of geographical objects.

2.2. Expansion of Purpose

In this research, the goal is to improve the recall of search results by extending viable objectives

at the same place by inference. In other words, it is possible to search for local products

and other stores in the same chain that do not include the query words in their reviews.

Paraphrasing a query in other words is an actively researched topic in the ﬁeld of information

retrieval. Natural language processing techniques are also commonly used in such studies

together with information retrieval techniques.

As an example of extending purposes, Pothirattanachaikul et al. [17] proposed a method

for extracting alternatives that can achieve the same objectives from community Question

Answering (cQA) sites. For example, “taking sleeping pills” and “drinking warm milk” are

alternative behaviors that can achieve the same goal of “falling asleep easily”. Their research

uses a bipartite graph consisting of the question and answer information extracted from the

cQA site. By analyzing this graph, they were able to ﬁnd alternative behaviors by ranking

similarity levels.

The expansion of purpose is also a big problem in research on cQA. Jiwoon et al. [8]

proposed a method of ﬁnding questions with a similar purpose in a cQA site. It can help

people who have a purpose but do not want to ask a question on a cQA site. This method

focused on how to calculate the similarity of questions. Abujabaet al. [1] similarly focus on

paraphrasing during retrieval in cQA sites and create a dataset for research that includes

paraphrasing. They have collected data from WikiAnswers and labeled it on a crowdsourcing

site to link questions with the same purpose.

The ideas used are related to ours, such as that places suitable for the same purpose are

similar to each other, and purposes that can be achieved in the same place are similar to each

other. Wang et al. [23] also tackle this problem. They used a natural language processing-

based approach that uses syntactic trees. Our method is considering both purpose similarity

and place similarity.

Our study uses graph processing to close the gap between the vocabulary of geographical

object reviews and queries, which is similar to query expansion. On the other hand, there

have been many studies on paraphrasing queries in order to make them comparable between

specialized vocabularies and terms frequently used in ordinary search users’ queries.

One area where rephrasing queries is most important is in the medical ﬁeld. There is a

signiﬁcant diﬀerence in medical information between the terms used by ordinary people and

the actual technical terms. For example, a patient might search for “having upset stomach”

before searching with the query “gastric ulcer”. In this context, research to collect the “con-

sumer health vocabulary [24]”, terms used by ordinary people in medical information, and use

them to the query expansion is an essential issue. As an example, Stanton et al. [20] propose

406 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

a method to link phrases used by ordinary users with technical terms such as disease names.

As an example of a geo-speciﬁc alternative place search method, Katsumi et al. [12] pro-

posed a method to recommend alternatives to places that the user wanted to visit. In order

to avoid overtourism, they use image similarity and other methods to discover places that can

achieve the same tourism goals. They focused on generic POIs (Point of Interest) suitable for

any tourism goals to recommend even minor places.

Our study is similar to these related studies in that we exhaustively search for locations

where the same objective can be achieved. On the other hand, the goal of our research

is to match queries and geographic entities, not to discover alternative representations or

alternative POIs directly.

2.3. Locality Recommendation

This research aims to ﬁnd a place where the user’s objectives can be achieved. For the same

purpose, there are studies that extract the characteristics of places and solve the problem by

recommendations and other approaches.

Kurashima et al. [13] proposed a method for extracting features of a place by extracting

information from a blog and visualizing the experience of a place on a map by topic modeling.

This research uses a more exhaustive but less descriptive review to estimate what can be done

at a location in order to discover geographical objects from a query.

As another research on recommending places, Wang et al. [22] extended the Bookmark-

coloring algorithm to represent information about past behavior on social media sites, location

information, relationships between users, and user similarity as a graph. By using the simi-

larity between users, they can recommend the next place the user is likely to visit with higher

accuracy than conventional recommendations.

Recommending places for users to visit next has been widely studied as POI recommen-

dation. As an example of a typical POI recommendation, Chen [5] et al. propose a collabora-

tive ﬁltering-based POI recommendation method based on the check-in information of LSBN

(Location-Based Social Networking) users and the category information of the POI. In recent

years, there has been an increase in the studies that use information from review sites for POI

recommendations, similar to our study. For example, Baral et al. [2] proposed ReEL, which

uses neural networks to recommend places from reviews. They extracted aspects from user

reviews and created a more accurate POI recommendation method.

Many studies have been conducted to estimate the nature of a place from information

gathered from social media and CGM (Consumer Generated Media) sites. Among them,

many studies use LBSN sites such as Twitter [21]. The most typical example is real-world

event detection or travel assistance. For instance, Dong et al. [6] proposed a method of ﬁnding

events by using Flikr photos. As a task to estimate the nature of a place, Zhang et al. [25]

integrate social media information to estimate the atmosphere and usage of a street.

POI discovery is another important element in geographic recommendations. The discov-

ery of spots that attract people’s attention from social media is close to the discovery of places

that are suitable for achieving objectives in this research. Some research uses social media

information and detects POIs and their usage or category [9]. Some studies have used review

information as well as this study [3].

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 407

3. Method Proposed

This section describes a new algorithm: a method that ranks places suitable for a purpose

directly given as a query. In order to realize such a retrieval model, we extract places and the

actions which were taken at the place from reviews of these places. Not all actions that can

be taken at a place are described in reviews of this place. Therefore, to search for places that

do not have a certain purpose in their reviews or that are not reviewed, the method has to

deduce and extend purposes of what can be done in such places.

To extend purposes, we adopt the following three hypotheses into a graph-based algorithm:

H1 Mutual Recursive Deduction,

H2 Expansion by Place Type, and

H3 Expansion by Word Semantics.

The ﬁrst hypothesis is at the core of our algorithm. The places where people can achieve the

same purpose are similar to each other. For instance, a park and a river beach are similar

places, because you can do the same things (e.g., playing a musical instrument, jogging,

playing catch) in both of them. In addition, the purposes that can be achieved in the same

place are similar to each other. For instance, eating hot-dog and drinking beer are similar

purposes, because both of them can be achieved in the same places (e.g., diners, beer halls,

baseball stadiums). To reﬂect this hypothesis, the method creates a bipartite graph consisting

of places and purposes. Thus, a reciprocal recurrence calculation is performed by link analysis.

The second hypothesis is based on the idea that the same purpose can be achieved in similar

types of places. For instance, if you were able to buy a burrito at a certain Starbucks, you

should be able to buy it at another branch of Starbucks. In addition, you might be able to buy

burritos in other coﬀee shops. To integrate this hypothesis, our method modiﬁes the bipartite

graph by adding links between places and places. The last hypothesis means that purposes

which have semantically similar meaning can be achieved in the same places. For instance,

if it is possible to buy toilet paper at a certain store, it will be possible to buy tissue paper

at the same store, because these two items are semantically similar products. Our method

integrates these hypotheses by adding virtual links between purposes.

3.1. Creating the Bipartite Graph for Mutual Recursion

Our method uses the review information about places as the data source that reﬂects purposes

that can be achieved at each place. First, our method makes a bipartite graph that consists

of words and places to express the ﬁrst hypothesis. The words that appear in reviews of the

same place are likely to be similar to each other, and places with reviews containing the same

words are similar to each other. The graph contains two types of nodes: all the places in the

dataset, and all the words in all the reviews for these places.

A schematic diagram of the entire dataset is shown in Figure 1. The review data is

represented as the relationship between a place l

and a word w

that appear in the review

for that place. Furthermore, there exists a relationship between a place and the metadata

about that place, and a relationship between a word and its topics.

First, we create a weighted directed bipartite graph, focusing on the relationship between

a place and the words in the review about it. As a pre-processing step, each review sentence

408 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

Fig. 1. A graph representation of the whole place-review dataset

was divided into words. For cleansing the review data written in natural language, word

selection by word-class was performed. Only verbs, nouns, and adjectives were treated as

nodes. Each word was lemmatized, all verbs were straightened to the standard form, and all

word changes (i.e., plurals) were removed. Cleansing by frequency was also done. Words that

appeared too frequently or very rarely were removed. Finally, places and words were linked

by edges if the word appeared in the review for the place. The bipartite graph created in this

phase is shown as a subgraph in the middle of Figure 1, with red and blue lines as edges.

Second, we create the adjacency matrix M from the created graph. Figure 2 shows a

schematic diagram of the ﬁnal shape of the adjacency matrix M , where L is the set of all the

place nodes in the graph and W is the set of all the word nodes in the graph. The matrix M

is a square matrix of dimension (|L| + |W |), where |L| and |W | denote the number of elements

in the sets.

Here, the value of each element m

of the matrix M is deﬁned as below. Figure 2 shows

the overall structure of matrix M . Let N

) be the subset of W connected to l

, and N

)

be the subset of L connected to w

. In Figure 2, the links from places to words are located in

the lower left blue part (i.e., i > |L| and j ≤ |L|). m

is set to 1 if w

is an element of N

and is 0 otherwise. Similarly, the upper right red part of the matrix in Figure 2 represents the

links from words to places. m

is set to 1 if l

is an element of N

), and 0 otherwise. That

is, in the lower left and upper right part of the matrix in Figure 2, m

will be 1 if the i-th

node and the j-th node are connected by an edge, and 0 otherwise. The review information

was rearranged into four relationships, which can be represented as a single adjacency matrix.

Finally, we normalized the weight of the edges that connect words to places. As the

number of edges increases in the unweighted state, the value increases cyclically in dense

parts in the graph. Therefore, we divide the weight of an edge by the number of outgoing

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 409

edges of the source node.

( )

Fig. 2. An overview of the expanded adjacency matrix M , which represents the relationships

between places (L) and words (W ).

3.2. Calculating Place Similarity for Place Type Expansion

Next, in order to adopt Hypothesis 2 (Expansion by Place Type), we added information about

the relationships between places to the graph. We hypothesize that the same purpose can

be achieved at similar places, for instance, “Starbucks in Tokyo” and “Starbucks in Kyoto”,

which are separate branches of an aﬃliated store. By considering the similarity between

places, it becomes possible to ﬁnd places that are not directly reviewed. Therefore, we extend

the graph to take into account the similarity between places by comparing their metadata.

In most online map applications, such as Google Maps, there exists metadata for each

place. A typical kind of metadata is the category information of a place, such as “restaurant”

or “hospital”. In this research, we used such categorical information about places as a feature

of places. We calculated the degree of association between places by using metadata that

indicates the relationship between them, and added the similarity into the graph. There are

various methods for calculating the degree of association between objects. In this research,

we adopt the cosine similarity of their category, as the most straightforward approach.

The metadata for a place can be considered a Boolean value vector. This allows us to

compute the similarity between places as a distance in a vector space. The vector l

of the

place l

is a |C|-dimensional vector where the set of all metadata is deﬁned as C. Each element

is set to 1 for the j-th element of the vector if there is a link to the metadata c

∈ C, or to 0

otherwise (see Figure 3).

410 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

0 1 1 0

Fig. 3. Vector representation of a place by using category metadata

The similarity sim

, l

) between the places l

and l

is deﬁned as

sim

, l

) =

· l

||l

, (1)

which is based on cosine similarity.

The calculation cost is a big problem for the actual link analysis calculation. In most

cases, the number of category tags that are linked to a place is as few as 1 to 5, and the

number of category tags is less than 100. Most places have a few tags, and some of the tags

are used too frequently. We eliminated frequent tags that have no explanatory ability. In our

implementation, we set a threshold and cut oﬀ some links.

Thus, we used metadata consistency to extend the graph by attaching virtual edges be-

tween places. In the upper left part (orange part) of the matrix of Figure 2, m

is set to 1 if

the metadata of places l

and l

are highly similar; otherwise it is set to 0.

3.3. Calculating Word Similarity for Purpose Expansion

Next, we extend the graph by focusing on Hypothesis 3 (Expansion by Word Semantics).

The degree of association between words is calculated, and added to the graph. For example,

guitar and ukulele are lexically close in their meaning. Therefore, we can extend the result so

that where you can achieve “guitar practice”, you can achieve “ukulele practice”. Thus, we

added a virtual link between them. This expansion aims to allow reviews that do not contain

their purpose directly to be reﬂected in the rankings of places. Here we extend the graphs to

take into account the similarity between words.

The computation of semantic similarity between words is a general problem, and it can

be solved by vectorization with methods such as LDA, LSI, or Word2Vec. Our method

utilizes a similarity calculation using Word2Vec. A Wikipedia corpus was used for learning

the Word2Vec model, because encyclopedic sites are suitable resources to calculate lexical

similarity.

The word similarity sim

, w

) (where w

is the i-th word) can be used to weigh the

links between words in the graph. The distributed representation of a word w

is deﬁned as

follows:

= w2v(w

). (2)

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 411

By using this vector, the similarity between the words w

and w

can be deﬁned as

sim

, w

) =

· w

||w

, (3)

which is the cosine similarity between the two vectors.

The similarity sim

, w

) takes a value between 0 and 1. As with the case of similarity

between places, we treat this value with a threshold to reduce the computational cost. Finally,

we used sim

, w

) as a Boolean value for calculation. The right bottom part (green part)

of Figure 2 represents sim

, w

) for each word in the dataset.

3.4. Ranking Places by Random Walk with Restart

So far, creating the matrix M that represents the expanded graph shown in Figure 4 has

been accomplished; it contains all the necessary relationships between places and words,

relationships among places, and relationships among words. By processing this matrix, it is

possible to compute the relevance of the nodes in the graph. The relevance between a word

node and a place reﬂects how the words in the query are related to the place. In other words,

it can rank the places that can achieve the purpose. We adopted Random walk with Restart

sim

, w

)

sim

, w

)

= 1

sim

, w

)

= 0

= 1

sim

, l

)

= 0

= 1

sim

, l

)

Fig. 4. Place-word graph expanded with word semantic similarity and place metadata similarity

(RWR) as the algorithm for calculating the degree of association between nodes in our graph.

First, in order to perform relevance calculations with RWR, we transformed the graph matrix

M into a transition probability matrix. The transformation to the transition probability

matrix was done by normalizing the matrix by columns, that is by dividing each entry by the

sum of the weights of the exit edges. Therefore, we need to consider M as a directed graph;

the link from a place to a word and the reverse link has diﬀerent weights.

Note that it is possible to change the weights for each hypothesis here. For example, to

increase only the similarity score between places, a weight can be applied only to the elements

in the upper left part of Figure 2 before this transformation.

412 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

The formulation for the actual calculation is as below. Let L be the set of all geographic

nodes in the graph and W be the set of all word nodes in the graph. |L| and |W | represent

the number of elements in each set. N

) is the subset of W connected to the edges exiting

, and N

) is the subset of L connected to the links exiting w

. The function sim

, l

)

means the similarity between the i-th place and the j-th place, and the function sim

, w

)

means the similarity between the i-th word and the j-th word. The matrix which represents

the graph structure M is deﬁned as











(if i > |L|)







(if j > |L|) : βsim

, w

)

(if j ≤ |L|)



(if w

∈ N

)) :

(otherwise) : 0

(if i ≤ |L|)







(if j > |L|)



(if l

∈ N

)) :

(otherwise) : 0

(if j ≤ |L|) : αsim

, l

)

(4)

where α and β are weights for each hypothesis (α for H2, β for H3), both of them taking

values from 0 to 1, and α + β ≤ 1. The transition probability matrix M

which is M

normalized by its rows is deﬁned by

|L|+|W |

k=1

, (5)

where m

is an element of M

RWR is an algorithm to compute the degree of association between nodes by performing a

random walk on the graph and randomly jumping to the initial node with a ﬁxed probability

at each step. Normally, to represent the jumping probability for the initial node q, a one-hot

vector q with the q-th element being 1 and the other elements being 0 is used. The nodes of

the words that appear in the given query can be used as the initial nodes.

However, in this research, we have to consider the case where the query consists of multiple

words, such as “guitar practice”. If the given query consists of two or more words, a random

jump to all the words in the query will give high relevance to place nodes that are not related

to the query. This is because the words in the query are not independent. For example,

the query “practice guitar” can be split into two-word nodes, “practice” and “guitar”. If

these two words are independently used as start nodes, the search results will be a mixture

of places associated with “guitar” and places associated with “practice”. The result will be

similar to the result of an OR search on a traditional search engine. A place node that is

highly associated with “practice” may not be a suitable place for “guitar practice”. It might

be suitable for other kinds of “practice”, such as “baseball practice” or “painting practice”.

Likewise, not all “guitar” related places are suitable for “guitar practice”; some of them may

be good places to ﬁx a guitar, or to buy a new guitar.

The solution to this problem (i.e., realizing AND search) is to set the initial nodes to place

nodes instead of word nodes. We set the initial nodes to only the place nodes where all the

words in the query appear together in a single review. If there is more than one corresponding

place, we randomly jump to all these place nodes with equal probability. This enables the

algorithm to increase the number of search results for long queries without a loss of accuracy.

The set of initial nodes is represented as a vector of r of |L| + |W | dimensions. Each

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 413

dimension r

is 1 in case the i-th node meets the condition, and 0 otherwise. To convert r

to a probability vector, it is normalized.

The RWR score for each node is calculated by the power method, repeating the equation

below:

p = (1 − c)M

p + cr. (6)

As the initial value of p, we used r. Repeating is continued until p converges. After the

convergence, the values of each element p

in the ﬁnal p can be used as relevance of the u-th

node for the given purpose query. The search result ranking is obtained by sorting all places

∈ L by p

in descending order.

4. Experiment

We evaluated the method’s usefulness in an experiment using real data collected from Google

Maps. The search results of ﬁve methods for nine pre-prepared purpose queries were manually

evaluated. An evaluator manually evaluated each of the top-ranked places.

The number of evaluators was one, because it is objectively possible to determine whether

an action is feasible in a given place. When the evaluator was unsure about the decision for

a place, they accessed the oﬃcial Website of that place, or called and inquired if people were

able to achieve their purposes there.

4.1. Dataset

For the experiment, we used the review data of places and place metadata collected with

the Places API of Google Maps. First, we used the Places API of Google Maps to collect

review information about places and their correlations. Google Maps puts a quantitative

limit on the data that can be collected in a certain period of time. Therefore, we limited the

search to about 80km

in a densely populated area of Tokyo, Japan, mainly in the Shinjuku,

Shibuya, and Chiyoda wards, and we collected all the places (i.e., geographic entities like

shops, facilities, and so on) contained in this area. Figure 5 shows the area covered by the

actual data set.

The list of places in an area and the reviews for them had to be collected via diﬀerent

APIs. The Google Find Place API limits the collectible number of places to only the top 32

results within the speciﬁed area. Therefore, we recursively called the API by dividing bigger

ranges into four quadrants when the number of included objects reached the upper limit.

Finally, by reducing the area to 25m square, 261,492 places were obtained. The reviews for

these objects were collected using the Place Details API. Due to API limitations, only the

top ﬁve reviews for each site were obtained. This resulted in 85,942 places with at least one

review with text.

4.2. Implementation

The reviews of 85,942 places in Google Maps were divided into words by using the Japanese

morphological analyzer MeCab. This step was necessary because words in Japanese text are

not separated by spaces. We used the dictionary called mecab-ipadic-NEologd, which includes

neologisms frequently used in social media services. The words used in our experiment were

limited to verbs, nouns, and adjectives, and the verbs were uniﬁed to their standard form.

414 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

Fig. 5. Area reviewed and collected from GoogleMaps. Includes Tokyo, Shibuya, and Shinjuku,

the main train stations in Tokyo, Japan.

Word cleansing was done by word frequency: rarely used words and words that appeared too

often were removed. We removed words that appeared in less than 50 of the 85,942 reviews

and words that appeared in more than 40 percent of the reviews. In the end, 9,816 words

were considered as nodes in the graph.

Next, we pre-calculated the degree of similarity between places. In order to calculate the

similarity between places, we used category tags. Each place in Google Maps has a maxi-

mum of ﬁve category tags. We used 97 categories assigned to the collected places, excluding

categories that occur frequently (i.e., establishment and point of interest) for generating a

vector consisting of Boolean values. By using this vector, we were able to compute the cosine

similarity in the vector space. In this experiment, due to the computational complexity, we

used only places with three or more categories of similarity and whose vectors are exactly the

same as each other.

The similarity of the words was calculated in advance. In the proposed method, the words

in the graph are connected to each other by virtual edges to account for semantically similar

purposes. We computed the similarity between all combinations of words for 9,816 word

nodes. As a data source for learning the word2vec model, we used Wikipedia data. As an

implementation of Word2Vec, gensim, Python’s topic analysis library, was used. In order

to keep the matrix sparse to reduce computational eﬀort, only combinations with similarity

greater than or equal to 0.5 were adopted, and other combinations were treated as having

zero similarity.

Finally, we computed the actual Random Walk with Restart and the ﬁt between the query

and the ground objects. To speed up the computation of a square matrix of 95,758 dimensions

consisting of objects and words, the Python library SciPy was used.

4.3. Comparative Methods

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 415

To analyze the eﬀectiveness of the three hypotheses, we prepared the following ﬁve methods:

• All (H1, H2, H3) is the method proposed that considers all hypotheses, i.e., (α = 0.1,

β = 0.1),

• Place Only (H1, H2) is a variant method which only considers place type similarity,

i.e., (α = 0.1, β = 0),

• Word Only (H1, H3) is another variant method which only considers semantic simi-

larity of words, i.e., (α = 0, β = 0.1),

• No Expansion (H1 only) is a plain method which does not consider similarity of places

and words, i.e., (α = 0, β = 0), and,

• Baseline is a traditional search algorithm that only ﬁnds places which have reviews

directly containing all query words.

For each of these ﬁve methods, a set of places for evaluation was created for pre-prepared

queries. The top 20 rankings obtained from each method were evaluated. For labeling, the

search results were ordered randomly.

4.4. Answer Labeling

Nine queries were prepared (see Table 1). For these queries, the search result rankings were

obtained for the ﬁve methods above. The places ranked in the top 20 of these search results

were manually labeled with binary values: 1 if it was possible to achieve the purpose there,

0 otherwise. Since the search result of the baseline method is not a ranking, 20 randomly

selected places in its result were evaluated.

Labeling was performed by a single evaluator, because it is objectively possible to deter-

mine whether or not the purpose is achievable at a given place. If in doubt about whether a

purpose was achievable, the evaluator was allowed to check the websites or make a phone call

to the place.

Note that this research does not consider the time of day or season (i.e., methods ignore

the timestamps of reviews). For this reason, places whose purpose is achievable during a

certain time of the year (e.g., a swimming pool that is open only in summer) were labeled as

correct. Similarly, places where it was possible in the past to achieve the purpose (e.g., places

that changed their business, or closed) were also labeled as correct.

4.5. Result

We describe the method-by-method and query-by-query precision and ranking evaluations,

and the actual output. Table 1 shows the p@k (precision at k) and nDCG (normalized

Discounted Cumulative Gain) obtained by the nine queries used in the experiment. (However,

nDCG cannot be computed for the Baseline because it is a Boolean search, not a ranking.)

As the overall result, all proposed methods achieved higher precision than Baseline. For

the average results of all queries, Place Only obtained the highest score.

The highest precision of the All method was achieved when the queries were “enjoy af-

ternoon tea” and “buy pizza”. For these queries, All greatly outperformed precision and

416 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

Table 1. evaluation result of 5 methods for 9 queries

All (proposed) Place Only Word Only No Expansion Baseline

p@20 nDCG p@20 nDCG p@20 nDCG p@20 nDCG p@20 (# found)

Guitar Practice 0.30 0.40 0.35 0.43 0.40 0.54 0.54 0.57 0.15 4

Buy Computer 0.45 0.59 0.45 0.59 0.35 0.38 0.40 0.41 0.45 49

Fix Computer 0.70 0.76 0.75 0.79 0.70 0.76 0.75 0.79 0.65 13

Eat Pizza 0.75 0.64 0.80 0.68 0.85 0.84 0.80 0.81 0.95 466

Buy Pizza 0.80 0.87 0.75 0.84 0.70 0.68 0.70 0.68 0.65 32

Catch a Fish 0.25 0.27 0.25 0.28 0.25 0.35 0.25 0.32 0.25 23

Have a BBQ 0.70 0.66 0.75 0.68 0.60 0.58 0.50 0.48 0.30 124

Enjoy Afternoon Tea 0.90 0.94 0.90 0.94 0.90 0.79 0.80 0.76 0.75 91

Swimming 0.05 0.03 0.20 0.14 0.15 0.10 0.25 0.21 0.20 78

Average 0.54 0.57 0.58 0.60 0.54 0.56 0.54 0.56 0.48 -

nDCG of Baseline and No Expansion. When the query was “buy computer”, all methods

obtained low precision. However, even for such a diﬃcult search task, All and Place Only

performed better than Baseline.

As an example of search results where the proposed method works well, Table 2 shows the

search results of each method for the query “buy pizza”. The proposed method found many

supermarkets and other establishments, not only Italian restaurants, where people can take

home a pizza, but not eat it in the shop.

As another example of a search where the proposed method did not perform well compared

to the comparison method, the results for the query “guitar practice” are shown in table 3.

Most methods found music stores, music schools, and music studios for this query, except for

Baseline.

5. Discussion

This section discusses the nature of each method, and the usefulness of the search results. To

discuss the nature of the proposed methods based on the experimental results, a comparison

of the advantages of each method is needed. Across the board, Place Only was the most

eﬀective for both precision and nDCG. Method All, with all expansions added, showed higher

precision than the Baseline. When focusing on nDCG, every expansion was more eﬀective

than No Expansion.

We discuss the quality of the obtained results. The proposed method was able to ﬁnd

many places that were not found in the Baseline. Many of the places found were judged as

suitable for the purpose. The actual search results included diﬀerent places depending on

the expansions used. This suggests that each of the expansions contributed to ﬁnding more

relevant places.

We focus on the cases in which the proposed method did not work eﬀectively. If the search

task itself was too diﬃcult, or conversely, too easy, all our methods were relatively ineﬀective.

For instance, in the task of ﬁnding a place suitable for eating pizza, it was possible to ﬁnd

a large number of places using conventional methods. In such cases, ﬁnding more places by

inference conversely reduced accuracy.

Finally, individual cases will be discussed. An example where the expansion by the Place

type deduction worked properly is the search task of “Buy Computer”. In this task, our

method deduced that you can buy a computer at an electronics store. Even though a store

has no reviews, our method was able to guess that the store sells computers by using place

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 417

Table 2. The top 20 results for each method and their relevance (Rel) to the query “buy pizza”

(Translated from Japanese).

Rank Rel All (proposed) Rel Place Only Rel Word Only

1 1 Dominos Pizza @ Awaji 1 Dominos Pizza @ Awaji 1 Italian Restaurant EATALY

2 1 Dominos Pizza @ Ebisu 1 Dominos Pizza @ Shinjuku Restaurant Fiorentina

3 1 Dominos Pizza @ Shinjuku 1 Dominos Pizza @ Ebisu 1 Italian Restaurant Picard @ Azabu

4 1 Dominos Pizza @ Asakusa 1 Dominos Pizza @ Asakusa 1 Italian Restaurant IL PANZEROTTO

5 1 Precce Shibuya DELIMARKET 1 Precce Shibuya DELIMARKET Cafeteria Espresso D Works

6 1 Supermarkets OK @ Hatsaudai 1 Seijo Ishii Convenience Store @ Kojimachi 1 Chronic Tacos BLAST!

7 1 Seijo Ishii Convenience Store @ Ikejiri 1 Supermarkets OK @ Hatsaudai 1 Dominos Pizza @ Awaji

8 1 Seijo Ishii Convenience Store @ Kojimachi 1 Seijo Ishii Convenience Store @ Ikejiri 1 Dominos Pizza @ Ebisu

9 1 Italian Restaurant IL FELICE 1 Italian Restaurant IL FELICE 1 Italian Restaurant Pour-kur

10 Book Store Majutsu-Dou Book Store Majutsu-Dou Seveneleven @ Ebisu

11 1 Italian Restaurant Picard @ Azabu 1 Italian Restaurant Picard @ Azabu 1 Pizza k

12 1 Italian Restaurant EATALY 1 Italian Restaurant EATALY 1 Italian Restaurant Picard

13 1 Shibuya Cheese Stand 1 Shibuya Cheese Stand 1 Precce Shibuya DELIMARKET

14 1 Italian Restaurant Pour-kur 1 Italian Restaurant Pour-kur Restaurant Rapopo Farm @Yotsuya

15 Restaurant Rapopo Farm @Yotsuya Restaurant Rapopo Farm @Yotsuya 1 Shibuya Cheese Stand

16 1 Delifrance Express 1 Delifrance Express Book Store Majutsu-Dou

17 Bar SHUGAR MARKET Bar SHUGAR MARKET 1 Supermarkets OK @ Hatsaudai

18 Italian Restaurant Fiorentina Italian Restaurant Fiorentina 1 Neapolitan Pizzeria 800 Degrees

19 1 Italian Restaurant IL PANZEROTTO 1 Italian Restaurant IL PANZEROTTO TENOHA &STYLE

20 1 Chronic Tacos BLAST! Cafeteria Espresso D Works 1 Seijo Ishii Convenience Store @ Ikejiri

# Relevant 16 15 14

Rank Rel No Expansion Rel Baseline

1 Restaurant Fiorentina 1 Italian Restaurant IL FELICE

2 1 Italian Restaurant Picard @ Azabu 1 Dominos Pizza @ Awaji

3 1 Italian Restaurant EATALY Italian Restaurant virage

4 1 Italian Restaurant Pour-kur 1 Dominos Pizza @ Shinjuku

5 Cafeteria Espresso D Works Garlic Restaurant Goemon

6 1 Italian Restaurant Picard Restaurant Rapopo Farm @Yotsuya

7 1 Italian Restaurant IL PANZEROTTO 1 Italian Restaurant Pour-kur

8 Garlic Restaurant Goemon 1 Italian Restaurant EATALY

9 1 Shibuya Cheese Stand 1 Supermarkets OK @ Hatsaudai

10 1 Pizza k 1 Delifrance Express

11 Restaurant Rapopo Farm @Yotsuya 1 Shibuya Cheese Stand

12 1 Chronic Tacos BLAST! Restaurant Fiorentina

13 1 Dominos Pizza @ Awaji Bar SHUGAR MARKET

14 1 Delifrance Express 1 Precce Shibuya DELIMARKET

15 1 Precce Shibuya DELIMARKET 1 Italian Restaurant Picard @ Azabu

16 1 Neapolitan Pizzeria 800 Degrees 1 Seijo Ishii Convenience Store @ Ikejiri

17 German Wine Bar Yuun Akasaka 1 Italian Restaurant Picard

18 1 Supermarkets OK @ Hatsaudai Cafeteria Espresso D Works

19 Seveneleven @ Ebisu 1 Dominos Pizza @ Ebisu

20 1 Dominos Pizza @ Ebisu Book Store Majutsu-Dou

# Relevant 14 13

type metadata.

Similarly, the extension by place type was highly accurate for the query “have a BBQ”.

The search results of the traditional method showed a lot of noise, such as “purchased BBQ

sauce ﬂavored food”. Inference by place types, such as barbecue sites or campgrounds, was

eﬀective. In other words, restaurants oﬀering barbecue sauce-ﬂavored food were ranked lower.

because among the places with reviews about BBQ, there were only a few restaurants that

oﬀer barbecue sauce-ﬂavored food, and more campgrounds.

For some queries, the proposed method had a lower precision than the baseline method.

However, the search results for these queries included places that were not found by traditional

methods. For example, for the query “guitar practice”, Baseline found only three places,

all of which were music classes, because only these places contained the query words directly

in their reviews. More music classes were found by the No Expansion method. In a more

extended approach, it was possible to ﬁnd shops, such as music stores that oﬀered guitar

lessons or had a performance space attached to them. In these cases, it was possible to rank

more suitable places by combining extensions in both words and places.

From these results, we see that the extended method that applied only place type-based

418 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

Table 3. The top 20 results for each method and their relevance (Rel) to the query “Guitar

Practice” (Translated from Japanese).

Rank Rel All (proposed) Rel Place Only Rel Word Only

1 Instrument Store IKEBE Drum @ Shibuya Instrument Store IKEBE Drum @ Shibuya 1 Music School Mion @ Nakano

2 1 Music School Mion @ Nakano 1 Music School Mion @ Nakano 1 Voice Training School Akihabara

3 1 Guitar School Cyta.jp @ Shinjuku 1 Voice Training School Akihabara 1 Guitar School Cyta.jp @ Shinjuku

4 1 Voice Training School Akihabara 1 Guitar School Cyta.jp @ Shinjuku Instrument Store IKEBE Drum @ Shibuya

5 1 Instrument Store Shimamura @ Shinjuku 1 Instrument Store Shimamura @ Shinjuku Psychiatry Medical Switch

6 Acoustic Guitar Shop Hobo’s Acoustic Guitar Shop Hobo’s 1 Voice Training School Yoyogi

7 Guitar Shop Acoustic Planet Guitar Shop Acoustic Planet 1 Music School Wood Shinjuku

8 1 Music School JBG Instrument Store Ishibashi @ Shibuya 1 Instrument Store Shimamura @ Shinjuku

9 Instrument Store Ishibashi @ Shibuya Instrument Store Lock-In Guitar & Drum Instrument Store Lock-In Guitar & Drum

10 Instrument Store Lock-In Guitar & Drum Music Store Yamano Odakyu @ Shinjuku Jazz Club Body and Soul

11 Instrument Store Da Vinci Violin 1 Music School JBG Parking MTG Akasaka

12 Music Store Yamano Odakyu @ Shinjuku Instrument Store Ochanomizu Gakki 1 Piano Bar Rocinante

13 Piano Store Grand Gallery Tokyo 1 Music Store Kurosawa Japan English Language School Global Square

14 Instrument Store Ochanomizu Gakki Instrument Store Da Vinci Violin English School Joshua

15 1 Music Store Kurosawa Japan Piano Store Grand Gallery Tokyo Vocal School Powerful Voice Shibuya

16 Music Studio Korakuen Instrument Store Shimokura Violin 1 Music School Bee Shinjuku

17 Instrument Store Shimokura Violin Guitar Shop Music Plaza Daikanyama Main Store Golf School Roots Gaien

18 Guitar Shop Music Plaza Daikanyama Main Store Music Shop Ukulele Planet Animation Academy Yoyogi Tokyo

19 Guitar Shop Grandy & Jungle Ukulele Shop Tantan @ Ochanomizu Study abroad agency Admani

20 Music Shop Ukulele Planet 1 Instrument Store Yamaha @ Ginza Programming School GFTD.

# Relevant 6 7 8

Rank Rel No Expansion Rel Baseline

1 1 Music School Mion @ Nakano 1 Music School Mion @ Nakano

2 1 Voice Training School Akihabara 1 Guitar School Cyta.jp @ Shinjuku

3 1 Guitar School Cyta.jp @ Shinjuku 1 Voice Training School Akihabara

4 Instrument Store IKEBE Drum @ Shibuya Instrument Store IKEBE Drum @ Shibuya

5 1 Voice Training School Yoyogi

6 Instrument Store Lock-In Guitar & Drum

7 Psychiatry Medical Switch

8 1 Music School Wood Shinjuku

9 1 Instrument Store Shimamura @ Shinjuku

10 1 Piano Bar Rocinante

11 Jazz Club Body and Soul

12 1 Music School Bee Shinjuku

13 Golf School Roots Gaien

14 Vocal School Powerful Voice Shibuya

15 Guitar Shop Acoustic Planet

16 English School Joshua

17 Parking MTG Akasaka

18 Acoustic Guitar Shop Hobo’s

19 Golf School Dream @ Ginza

20 1 TOKYO AKIBA MUSIC SC

# Relevant 9 3

inference had the highest performance. However, it can be said that each extension has

diﬀerent strengths.

6. Conclusion

In this research, we proposed a new search algorithm that ranks the places that can achieve a

given purpose. In a conventional retrieval system, searchers have to input the type of business

and the characteristics of the place to be searched as a query. This makes it diﬃcult to ﬁnd

a place, such as a place for “guitar practice”, by purpose. Therefore, by using geographical

review information such as Google Maps, we made the search system able to accept the

purpose directly. By extending it with three types of hypotheses, searchers can search for

places by inputting their purpose. We implemented a web application based on the Random

Walk with Restart-based graph analysis method. The experimental result shows that our

method can ﬁnd more suitable places than existing place search methods.

As a future challenge, an increase in the accuracy of the search results is needed. Also,

the amount of calculation is another important problem. Our method requires the creation

of a graph and convergence calculations each time a query is entered. In order to operate the

Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst 419

search model as an actual Web service, it is necessary to improve the speed of the service by

grouping similar places and purposes in advance. In the future, it is necessary to conduct

more advanced research to realize such a search as an actual web service.

Acknowledgements

This work was supported by JSPS KAKENHI Grants Number 22H03905, 21H03774, 21H03775,

18K18161, 18H03243, and ROIS NII Open Collaborative Research 2022 Grant Number 22S1001.

References

1. Abdalghani Abujabal, Rishiraj Saha Roy, Mohamed Yahya, and Gerhard Weikum. Comqa: A

community-sourced dataset for complex factoid question answering with paraphrase clusters. In

Proceedings of NAACL-HLT, pages 307–317, 2019.

2. Ramesh Baral, XiaoLong Zhu, S. S. Iyengar, and Tao Li. Reel: Review aware explanation of

location recommendation. In Proceedings of the 26th Conference on User Modeling, Adaptation

and Personalization, UMAP ’18, page 2332, New York, NY, USA, 2018. Association for Computing

Machinery.

3. Ramesh Baral, XiaoLong Zhu, S. S. Iyengar, and Tao Li. Reel: Review aware explanation of

location recommendation. In Proceedings of the 26th Conference on User Modeling, Adaptation

and Personalization, UMAP ’18, page 2332, New York, NY, USA, 2018. Association for Computing

Machinery.

4. Sandro Bauer, Filip Radlinski, and Ryen W White. Where can i buy a boulder?: Searching for

oﬄine retail locations. In Proceedings of the 25th International Conference on World Wide Web,

pages 1225–1235. International World Wide Web Conferences Steering Committee, 2016.

5. Hongbo Chen, Mohammad Shamsul Areﬁn, Zhiming Chen, and Yasuhiko Morimoto. Place rec-

ommendation based on users check-in history for location-based services. International Journal

of Networking and Computing, 3(2):228–243, 2013.

6. Xiaowen Dong, Dimitrios Mavroeidis, Francesco Calabrese, and Pascal Frossard. Multiscale event

detection in social media. Data Mining and Knowledge Discovery, 29(5):1374–1405, 2015.

7. Ramaswamy Hariharan, Bijit Hore, Chen Li, and Sharad Mehrotra. Processing spatial-keyword

(sk) queries in geographic information retrieval (gir) systems. In 19th International Conference

on Scientiﬁc and Statistical Database Management (SSDBM 2007), pages 16–16. IEEE, 2007.

8. Jiwoon Jeon, W. Bruce Croft, and Joon Ho Lee. Finding similar questions in large question

and answer archives. In Proceedings of the 14th ACM International Conference on Information

and Knowledge Management, CIKM 05, page 8490, New York, NY, USA, 2005. Association for

Computing Machinery.

9. Shuhui Jiang, Xueming Qian, Jialie Shen, Yun Fu, and Tao Mei. Author topic model-based

collaborative ﬁltering for personalized poi recommendations. IEEE Transactions on Multimedia,

17(6):907–918, 2015.

10. Christopher B Jones and Ross S Purves. Geographical information retrieval. International Journal

of Geographical Information Science, 22(3):219–228, 2008.

11. Makoto P. Kato, Satoshi Oyama, Ohshima Hiroaki, and Katsumi Tanaka. Query by example for

geographic entity search with implicit negative feedback. In Proceedings of the 4th International

Conference on Uniquitous Information Management and Communication, ICUIMC 10, New York,

NY, USA, 2010. Association for Computing Machinery.

12. Hisao Katsumi, Wataru Yamada, and Keiichi Ochiai. Generic poi recommendation. In Ad-

junct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous

Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers,

UbiComp-ISWC ’20, page 4649, New York, NY, USA, 2020. Association for Computing Machinery.

13. Takeshi Kurashima, Taro Tezuka, and Katsumi Tanaka. Blog map of experiences: Extracting and

geographically mapping visitor experiences from urban blogs. In International Conference on Web

420 Geographic Entity Retrieval for Finding Places Suitable for Certain Purposes ...

Information Systems Engineering, pages 496–503. Springer, 2005.

14. Yui Maekawa, Yoshiyuki Shoji, and Martin J. D¨urst. How to ﬁnd a place suitable for guitar

practice: Purpose-oriented geographic entity retrieval by using online review graph analysis. In

The 23rd International Conference on Information Integration and Web Intelligence, iiWAS2021,

page 115122, New York, NY, USA, 2021. Association for Computing Machinery.

15. Thomas Mandl, Paula Carvalho, Giorgio Maria Di Nunzio, Fredric Gey, Ray R Larson, Diana

Santos, and Christa Womser-Hacker. GeoCLEF 2008: The CLEF 2008 cross-language geographic

information retrieval track overview. In Workshop of the Cross-Language Evaluation Forum for

European Languages, pages 808–821. Springer, 2008.

16. Barak Pat, Yaron Kanza, and Mor Naaman. Geosocial search: Finding places based on geotagged

social-media posts. In Proceedings of the 24th International Conference on World Wide Web,

pages 231–234. ACM, 2015.

17. Suppanut Pothirattanachaikul, Takehiro Yamamoto, Sumio Fujita, Akira Tajima, Katsumi

Tanaka, and Masatoshi Yoshikawa. Mining alternative actions from community q&a corpus. Jour-

nal of Information Processing, 26:427–438, 2018.

18. Ross S. Purves, Paul Clough, Christopher B. Jones, Mark H. Hall, and Vanessa Murdock. Geo-

graphic Information Retrieval: Progress and Challenges in Spatial Search of Text. 2018.

19. Yoshiyuki Shoji, Katsurou Takahashi, Martin J D¨urst, Yusuke Yamamoto, and Hiroaki Ohshima.

Location2vec: Generating distributed representation of location by using geo-tagged microblog

posts. In International Conference on Social Informatics, pages 261–270. Springer, 2018.

20. Isabelle Stanton, Samuel Ieong, and Nina Mishra. Circumlocution in diagnostic medical queries.

In Proceedings of the 37th International ACM SIGIR Conference on Research & Development

in Information Retrieval, SIGIR ’14, page 133142, New York, NY, USA, 2014. Association for

Computing Machinery.

21. Kristin Stock. Mining location from social media: A systematic review. Computers, Environment

and Urban Systems, 71:209–240, 2018.

22. Hao Wang, Manolis Terrovitis, and Nikos Mamoulis. Location recommendation in location-based

social networks using user check-in data. In Proceedings of the 21st ACM SIGSPATIAL Interna-

tional Conference on Advances in Geographic Information Systems, pages 374–383. ACM, 2013.

23. Kai Wang, Zhaoyan Ming, and Tat-Seng Chua. A syntactic tree matching approach to ﬁnding

similar questions in community-based qa services. In Proceedings of the 32nd International ACM

SIGIR Conference on Research and Development in Information Retrieval, SIGIR 09, page 187194,

New York, NY, USA, 2009. Association for Computing Machinery.

24. Qing Zeng, Tony Tse, Guy Divita, Alla Keselman, Jonathan Crowell, Allen Browne, Sergey Gory-

achev, Long Ngo, et al. Term identiﬁcation methods for consumer health vocabulary development.

Journal of medical Internet research, 9(1):e606, 2007.

25. Yihong Zhang, Panote Siriaraya, Yukiko Kawai, and Adam Jatowt. Automatic latent street type

discovery from web open data. Information Systems, 92:101536, 2020.