Building a Corpus and Getting relevant articles from a list of articles.

Last week we demoed how, from a list of URLs, you can optimize your communication! This week, we will show you:

How to create a Corpus to feed your analytics tools and get more relevant articles from your selection of articles.

In other words: How can I get more of those articles I find relevant for my analytical/survey/study project?

Like last week, we will preserve the confidentiality of those involved in this real business case by not revealing names or original articles.

1/ The case: A study about a special look at Sports

Client is on a study about some specific aspects of Sports and gave us a list of few articles found interesting to explore.

They need more of those articles and, ultimately, feed their analytics tools with a Corpus made and always up-to-date with relevant sources.

2/ Learning from the classifications of those articles

As mentioned above, we will not share those articles to preserve the confidentiality of the client.

Here are the top, weighted classifications from the articles list:

3/ Creating a Corpus from those classifications

Client told us Sports was the target, so we’ll ask TrustedOut for Sources specialized in all Sports.

And will add the condition that those sources are covering one or more of the top classifications found above: Fashion, Communication and/or Digital Life.

Also, client wants to use the articles he gaves us found in France to explore a new market: the US.

From a list of french articles to a US-France Corpus

Mouse over to zoom. Click to full screen

TrustedOut returns 59 Medias, 96 sources representing an average close to 250 articles per day.

Here are 3 examples of sources found for this Corpus and their respective main profiles over the past week


  • People › Sports › Football And Soccer | 31.8%
  • People › Lifestyle › Fashion | 21.8%
  • People › Lifestyle › Luxury | 17.2%
  • People › Sports › American Football |7.0%
  • People › Sports › Cycling | 5.9%


  • People › Lifestyle › Fashion | 14.6%
  • People › Entertainment And Leisure › Celebrities | 11.3%
  • People › Culture And Arts › Music | 10.9%
  • People › Culture And Arts › Movies | 4.9%
  • People › Entertainment And Leisure › TV And Video And WebTV | 4.0%

Highlights Football

  • People › Sports › Football And Soccer | 31.5%
  • People › Sports › Table Tennis | 19.6%
  • General › Tech › Software And OS | 12.8%
  • People › Sports › Basketball | 11.3%
  • General › Tech › Digital Life | 10.5%

4/ Reading targeted articles

Let’s get the latest articles from our Corpus.

Below is what the beginning of the list of those articles looks like with URLs, time stamps and classifications for each relevant article.

Fashion classified articles?

Fashion, as seen above, was the top classification found from the list of articles that were given to us.

How about getting articles from our Corpus classified in Fashion?

Simply select this classification in the list of articles coming from your TrustedOut Corpus! Here are the first 2:

Want to read them?

Les maillots de gardiens 2020-2021 d’Umbro s’inspirent des annees 90

Best Outdoor Gear Deals of the Week | GearJunkie

Why it’s so critical?

The Corpus makes or breaks any analytics.

No matter how smart your analytics algorithm is, if you feed it with too few, too biased, too outdated, too broad… not only will you get twisted results from your genius algos but, worst, decisions made from it will be wrong and untrustworthy.

Trust your Corpus to Trust your Decisions.

We’ve shared 2 ways to build a trustworthy Corpus:

Criteria-based Corpus creation:

TrustedOut was made to get content corresponding to profiles you trust for a specific purpose.

Example-based Corpus creation:

These two last posts demoed how you can get more from a list of articles/URL.

Questions? Reach out!


Share this post via:

Published by

Freddy Mini

CEO & Co-founder