Article Collection
Labservatory uses state-of-the-art techniques to collect articles from ten different news portals. Specifically, ten background jobs are executed frequently and in parallel for indexing all the latest news items using web crawlers and RSS feeds. Then the content of the news items is extracted, using artificial intelligence methods, and stored in a database.
Keyword Generation
During the parsing and processing, our parser generates the keywords of the news items based on their title. Then, it inserts the keywords in the database and assigns them to the news item. No new record is created if this keyword is already present in our database.
Category Generation
Categories are defined before the parsing procedure of the article collection. Thus, our system does not generate categories. The predefined categories are a common set of words taken from the websites, and they are the following:
Κύπρος, Ελλάδα, Διεθνή, Πολιτική, Υγεία, Κυπριακό, Οικονομία, Αθλητικά, Κοινωνία, Επιστήμη, Επιχειρήσεις, Ενέργεια, Ψυχαγωγία, Κορονοϊός, Εκλογές, Άλλα θέματα.
Analytical Category Generation
These internal categories are managed by the users having the appropriate permission. The user can assign a specific “internal category” word to a news item and then search by it.
Search
To provide the best search experience possible, we used a tool called Meilisearch. To index all our platforms' news items to Meilisearch, we use Laravel Scout. The search engine tokenises search queries and performs sophisticated algorithms to generate the most relevant result. For example, when a search term includes the word "Cyprus", Meilisearch can identify similar words using its tokenisation technique and may provide results with other word directives such as "Cypriot". Articles with exact much and more repetition of this word will come up higher in the result (relevance order). Search terms can also include phrases, and the same logic will apply. For instance, a search result for "University of Cyprus" may consist of articles that specify any combinations of the words "University and Cyprus" or derivatives. Lastly, exact word matching is available when text queries are wrapped with quotes: “”.
Advanced Search
The advanced search feature allows users who have access to this function to enrich their query with specific criteria such as Publication Date, Categories, Newsportal, Internal Categories, and Keywords. All required fields are computed during the article collection phase except Internal categories (Analytical Categories), which are assigned manually by the researchers.
Dashboard
The “labservatory” web application provides a dashboard access for managing users and resources and viewing the system reports.