Work with Open-Source Intelligence (Part 3): Level-up your investigation by connecting the dots

This article explains how to get information from a name, company, website, or product details. It is part of a series on Open-Source Intelligence (OSINT), an ensemble of techniques to help gather information from publicly available resources, brought to you by Reporters Without Borders (RSF).

Open-Source Intelligence (OSINT) techniques are used to collect, analyse, and utilise publicly available information to support fact-finding and investigation, without having to rely on internal or classified data. Investigation often starts with one lead: a name, a phone number, an email address, a website, or a receipt. In this article, Reporters Without Borders (RSF) introduces the essential research tools and techniques journalists can use to trace one lead to another.

Starting with only a name

A direct search of a person’s or a company’s name in search engine or on social media is always the best place to start, as a vast amount of personal data is captured by tech giants. What kind of information is available depends on the jurisdiction. Some databases, such as Fast People Search in the United States, can provide individuals’ addresses, employment history, property records, and more.

When searching for information about a business, the most reliable databases are business registries. However, government-owned business registries often have restrictions: requiring national IDs, or authorised credentials to search the database. These restrictions pose an inconvenience to investigators who may not have the required ID, or may not want to use it to protect their anonymity.

Private business registries, such as Open Corporates, or crowdsourced databases, such as LittleSis, usually have less restrictions. Alternatively, databases specifically focussed on publishing leaked data, such as ICIJ Offshore Leaks database and Distributed Denial of Secrets, have powerful search engines that allow investigators to research names, entities, and business relationships, unveiled by major leaks like the Panama Papers, Pandora Papers, and the Myanmar Investment Commission hacks.

Starting with a phone number or email

Again, the simplest first step to research the people behind a phone number or email address is to use a search engine. Some countries’ business registries are transparent about business owners’ phone numbers or email addresses, and the information will come up in search engine results.

Caller ID apps like Get Contact or True Caller are designed to block spam calls but, unknowingly to many users, also grant the app access to their phone contacts—meaning the databases for these apps contain the names and phone numbers of a huge number of people. Investigators can download the apps themselves (making sure not to grant the app access to their own contacts) and search the database for the phone number they are trying to identify. Some private aggregators of public record and online databases, such as Spokeo, also allow search for individuals using email or phone number.

Databases such as Have I Been Pwned archive information revealed in data breaches, and allow users to search across their database to see if their credentials have been compromised. Cross-examining known phone numbers or email addresses can reveal the names behind them if the data was previously exposed. Furthermore, if a phone number or email address is compromised in a breach, for instance, on Flickr, this means the target had accounts on the platform. Investigators could then narrow their search and try to find the connected accounts on that platform.

Starting with only a website name

Combing through every page on a website is time-consuming and often unproductive. Utilising advanced search engine operators, such as searching specifically for PDF or text files on a given website, could reveal internal memos, HR manuals, or even plain-text lists of passwords accidentally uploaded to a public server. Trying different combinations of search terms, like known names, email addresses, phone numbers etc., could help narrow the search.

Historical records of a website provide valuable insights about how content has been modified, deleted, or hidden. Search engines like Bing and Google often keep older archived copies of websites, known as “caches.” Caches can enable investigators to retrieve deleted statements, announcements, and other information. Search cache:example.com in a search engine, or check archived versions of a website using the Wayback Machine.

When a website is built, information of its creator is documented. All websites have a unique identification, known as Domain Name System (DNS) record. WHOIS Lookup is a tool used to view DNS information, such as the date the domain was registered, record of ownership, and possibly the registrant’s contact information. Third-party domain registrars such as GoDaddy will anonymise the registrant details, and are commonly used nowadays. However, past DNS records, viewable through Complete DNS, might reveal the registrant’s real details from an earlier version of the website.

Starting with trade records or products

Investigative journalists may often trace the flow of unethical goods and sanction violations. In a complex and multi-tiered global supply chain, it is hard to establish direct connection between producers of goods and their purchasers. Private shipment record aggregators like ImportYeti, or Panjiva have collected hundreds of millions of Bill of ladings, which include names of shipper, consignee, country of origin, country of destination, volume of goods and sometimes product identification numbers (HS code).

The HS Code, a six-digit identification number, is an internationally accepted classification system for traded goods used for customs purposes. The HS codes on shipping records reveal the type of goods contained in the shipment. On some occasions, shippers might include additional product identification code. Companies might copy and paste the product identification code on consumer sites, which allow journalists to match the exact product that belongs to a particular batch of shipment.

← Read Part 1: Work with Open-Source Intelligence (Part 1): Extracting information from online images
← Read Part 2: Work with Open-Source Intelligence (Part 2): Extracting informations from online videos

Starting with only a name

Starting with a phone number or email

Starting with only a website name

Starting with trade records or products

Latest Posts