Overthunk

Datalog + Encoded Slurping

Jul 14, 2020


Contents

LearnDatalog Solutions

Chapter 1 - Basic Queries

  1. Find the entity ids of movies made in 1987
[:find ?e
 :where
 [?e :movie/year 1987]]
  1. Find the entity-id and titles of movies in the database
[:find ?e ?title
 :where
 [?e :movie/title ?title]]
  1. Find the name of all people in the database
[:find ?name
 :where
 [?e :person/name ?name]]

Chapter 2 - Data Patterns

  1. Find movie titles made in 1985
[:find ?title
 :where
 [?m :movie/year 1985]
 [?m :movie/title ?title]]
  1. What year was “Alien” released?
[:find ?year
 :where
 [?m :movie/title "Alien"]
 [?m :movie/year ?year]]
  1. Who directed RoboCop? You will need to use [<movie-eid> :movie/director <person-eid>] to find the director for a movie.
[:find ?name
 :where
 [?m :movie/title "RoboCop"]
 [?m :movie/director ?p]
 [?p :person/name ?name]]
  1. Find directors who have directed Arnold Schwarzenegger in a movie.
[:find ?name
 :where
 [?p :person/name "Arnold Schwarzenegger"]
 [?m :movie/cast ?p]
 [?m :movie/director ?p1]
 [?p1 :person/name ?name]]

Chapter 3 - Parametrized Queries

  1. Find movie title by year
[:find ?title
 :in $ ?year
 :where
 [?m :movie/year ?year]
 [?m :movie/title ?title]]
  1. Given a list of movie titles, find the title and the year that movie was released.
[:find ?title ?year
 :in $ [?title ...]
 :where
 [?m :movie/title ?title]
 [?m :movie/year ?year]]
  1. Find all movie ?titles where the ?actor and the ?director has worked together
[:find ?title
 :in $ ?actor ?director
 :where
 [?p :person/name ?director]
 [?m :movie/director ?p]
 [?m :movie/cast ?p1]
 [?p1 :person/name ?actor]
 [?m :movie/title ?title]]
  1. Write a query that, given an actor name and a relation with movie-title/rating, finds the movie titles and corresponding rating for which that actor was a cast member.
[:find ?title ?rating
 :in $ ?actor [[?title ?rating]]
 :where
 [?p :person/name ?actor]
 [?m :movie/cast ?p]
 [?m :movie/title ?title]]

Chapter 4 - More queries

datom - [eid attr val tx]

Get the attribute keywords using the :db/ident attribute.

Only attribute associated with a transaction is :db/txInstant which is the instant in time when the tx was committed to the database.

  1. What attributes are associated with a given movie.
[:find ?attr
 :in $ ?title
 :where
 [?m :movie/title ?title]
 [?m ?a]
 [?a :db/ident ?attr]]
  1. Find the names of all people associated with a particular movie (i.e. both the actors and the directors)
[:find ?name
 :in $ ?title [?attr ...]
 :where
 [?m :movie/title ?title]
 [?m ?attr ?p]
 [?p :person/name ?name]]
  1. Find all available attributes, their type and their cardinality. This is essentially a query to find the schema of the database. To find all installed attributes you must use the :db.install/attribute attribute. You will also need to use the :db/valueType and :db/cardinality attributes as well as :db/ident.
[:find ?attr ?type ?card
 :where
 [_ :db.install/attribute ?a]
 [?a :db/valueType ?t]
g [?a :db/cardinality ?c]
 [?a :db/ident ?attr]
 [?t :db/ident ?type]
 [?c :db/ident ?card]]
  1. When was the seed data imported into the database? Grab the transaction of any datom in the database, e.g., [_ :movie/title _ ?tx] and work from there.
[:find ?inst
 :where
 [_ :movie/title _ ?tx]
 [?tx :db/txInstant ?inst]]

Chapter 5 - Predicates

  1. Find movies older than a certain year (inclusive)
[:find ?title
 :in $ ?year
 :where
 [?m :movie/title ?title]
 [?m :movie/year ?y]
 [(<= ?y ?year)]]
  1. Find actors older than Danny Glover
[:find ?actor
 :where
 [?p1 :person/name "Danny Glover"]
 [?p2 :person/name ?actor]
 [?p1 :person/born ?b1]
 [?p2 :person/born ?b2]
 [_ :movie/cast ?p2]
 [(> ?b1 ?b2)]
 [?p2 :person/name ?actor]]
  1. Find movies newer than ?year (inclusive) and has a ?rating higher than the one supplied
[:find ?title
 :in $ ?year ?rating [[?title ?rating1]]
 :where
 [(< ?rating ?rating1)]
 [?m :movie/year ?y]
 [(<= ?year ?y)]
 [?m :movie/title ?title]]

Chapter 6 - Transformation Functions

  1. Find people by age. Use the function tutorial.fns/age to find the age given a birthday and a date representing “today”.
[:find ?name
 :in $ ?age ?today
 :where
 [?p :person/name ?name]
 [?p :person/born ?bday]
 [(tutorial.fns/age ?bday ?today) ?age]]
  1. Find people younger than Bruce Willis and their ages.
[:find ?name ?age
 :in $ ?today
 :where
 [?p1 :person/name "Bruce Willis"]
 [?p1 :person/born ?b1]
 [?p2 :person/name ?name]
 [?p2 :person/born ?b2]
 [(tutorial.fns/age ?b1 ?today) ?bruceage]
 [(tutorial.fns/age ?b2 ?today) ?age]
 [(< ?age ?bruceage)]
 [?p :person/name ?name]]
  1. The birthday paradox states that in a room of 23 people there is a 50% chance that someone has the same birthday. Write a query to find who has the same birthday. Use the < predicate on the names to avoid duplicate answers. You can use (the deprecated) .getDate and .getMonth java Date methods.
[:find ?name-1 ?name-2
 :where
 [?p1 :person/name ?name-1]
 [?p1 :person/born ?bday1]
 [?p2 :person/name ?name-2]
 [?p2 :person/born ?bday2]
 [(< ?name-1 ?name-2)]
 [(.getDate ?bday1) ?bdate1]
 [(.getMonth ?bday1) ?bm1]
 [(.getDate ?bday2) ?bdate2]
 [(.getMonth ?bday2) ?bm2]
 [(= ?bdate1 ?bdate2)]
 [(= ?bm1 ?bm2)]]

The given solution is -

[:find ?name-1 ?name-2
 :where
 [?p1 :person/name ?name-1]
 [?p2 :person/name ?name-2]
 [?p1 :person/born ?born-1]
 [?p2 :person/born ?born-2]
 [(.getMonth ?born-1) ?m]
 [(.getMonth ?born-2) ?m]
 [(.getDate ?born-1) ?d]
 [(.getDate ?born-2) ?d]
 [(< ?name-1 ?name-2)]]

The difference between this and my implementation is that the chapter solution uses the data pattern to match months and dates rather than an outright comparison.

Chapter 7 - Aggregates

Aggregate functions like sum and max are written in the :find clause.

  1. count th number of movies in the database
[:find (count ?m)
 :where
 [?m :movie/title]]
  1. Find the birth date of the oldest person in the database
[:find (min ?date)
 :where
 [?p :person/born ?date]]
  1. Given a collection of actors and (the now familiar) ratings data. Find the average rating for each actor. The query should return the actor name and the avg rating.
[:find ?actor (avg ?rating)
 :in $ [?actor ...] [[?title ?rating]]
 :where
 [?m :movie/title ?title]
 [?m :movie/cast ?p]
 [?p :person/name ?actor]]

Chapter 8 - Rules

  1. Write a rule [movie-year ?title ?year] where ?title is the title of some movie and ?year is that movies release year.

Query:

[:find ?title
 :in $ %
 :where
 [movie-year ?title 1991]]

Rules:

[[(movie-year ?title ?year)
  [?m :movie/title ?title]
  [?m :movie/year ?year]]]
  1. Two people are friends if they have worked together in a movie. Write a rule [friends ?p1 ?p2] where p1 and p2 are person entities. Try with a few different ?name inputs to make sure you got it right. There might be some edge cases here.

Query:

[:find ?friend
 :in $ % ?name
 :where
 [?p1 :person/name ?name]
 (friends ?p1 ?p2)
 [?p2 :person/name ?friend]]

Rules:

[[(friends ?p1 ?p2)
  [?m :movie/cast ?p1]
  [?m :movie/cast ?p2]
  [(not= ?p1 ?p2)]]
 [(friends ?p1 ?p2) [?m :movie/cast ?p1] [?m :movie/director ?p2]]
 [(friends ?p1 ?p2) (friends ?p2 ?p1)]]
  1. Write a rule [sequels ?m1 ?m2] where ?m1 and ?m2 are movie entities. You’ll need to use the attribute :movie/sequel. To implement this rule correctly you can think of the problem like this: A movie ?m2 is a sequel of ?m1 if either

    ?m2 is the “direct” sequel of m1 or ?m2 is the sequel of some movie ?m and that movie ?m is the sequel to ?m1.

There are (at least) three different ways to write the above query. Try to find all three solutions.

Query:

[:find ?sequel
 :in $ % ?title
 :where
 [?m :movie/title ?title]
 (sequels ?m ?s)
 [?s :movie/title ?sequel]]

Rules:

[[(sequels ?m1 ?m2) [?m1 :movie/sequel ?m2]]
 [(sequels ?m1 ?m2) [?m :movie/sequel ?m2] (sequels ?m1 ?m)]]]

Chapter 9 Exercises

  1. Write a function that takes a string as an argument and searches for it on Bing and Google using the slurp function. Your function should return the HTML of the first page returned by the search.
(defmacro ustr
  "URL Encodes the string"
  [& s]
  `(URLEncoder/encode (str ~@s)))

(defn construct-search-url
  "Given the name of a search engine, Constructs a valid search URL"
  [engine]
  (str "https://" engine ".com/search?q%3D")) ;; Have to url encode the space

(defn search-on-web
  [search-string]
  (let [encoded-search (ustr search-string)
        first-page (promise)]
    (doseq [x ["bing" "google"]]
      (let [search-url (str (construct-search-url x) encoded-search)]
        (future (deliver first-page {:page (slurp search-url) :engine x}))))
    (deref first-page)))

(def srch (search-on-web "athens"))
  1. Update your function so it takes a second argument consisting of the search engines to use.
(defn search-on-web
  [search-string engines]
  (let [encoded-search (ustr search-string)
        first-page (promise)]
    (doseq [x engines]
      (let [search-url (str (construct-search-url x) encoded-search)]
        (future (deliver first-page {:page (slurp search-url) :engine x}))))
    (deref first-page)))

(def srch (search-on-web "I am a doo doo head" ["bing" "google"]))
(:engine srch)
;; => "google"
  1. Create a new function that takes a search term and search engines as arguments, and returns a vector of the URLs from the first page of search results from each search engine.
(def url-regex #"(?i)\b((?:([a-z][\w-]+:(?:/{1,3}|[a-z0-9%]))|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))")

(defn first-page-urls
  "Takes a search term, search engines and returns a vector of URLS from first page from each search engine"
  [search-string engines]
  (let [encoded-search (ustr search-string)
        first-page-urls (atom {})]
    (doseq [x engines]
      (let [search-url (str (construct-search-url x) encoded-search)]
        (future (swap! first-page-urls assoc (keyword x)
                       (map first (re-seq url-regex (slurp search-url)))))))
    first-page-urls))

(def srch (first-page-urls "Athens Research" ["bing" "google"]))

Takeaways

So turns out the reason why I wasn’t able to slurp google.com yesterday was because I wasn’t URL encoding my search terms. (Thanks to @adrien for that tip). Once I got the URL encode to work by using the java.io.URLEncode library’s encode function, the chapter exercises became pretty easy to solve.

Also I’m pretty glad I finished the DataLog course. I kinda stumbled at the end with the Rules section. I think I need to spend more time with Logic Programming to understand how rules work in general. But apart from that, this also turned out to be pretty easy.

Today’s tally -