Overview#
When enrichment providers or LinkedIn don't have the information you need on hard-to-enrich company segments (e.g. brick and mortar businesses), public data can often be a goldmine
- especially depending on the region you're targeting.
We paired up with JetHR to show you how to use Cargo to find and use publicly available information for their target customers in European market. By leveraging web scraping and AI, they used this workflow to fill in the gaps in their CRM.
All integrations mentioned in this template require an associated connector to be set up in your Cargo workspace. Some integrations are eligible for use with Cargo credits. See the documentation for instructions on setting up connectors and using Cargo credits.
Scrape web pages and upsert information to the CRM inside a Cargo workflow#
Step 1 - Set variables#
Set up your input variables to the workflow
Inputs used in the workflow are set up in the variables node at the beginning of the workflow. This node is used to define the parameters that will be passed through the rest of workflow as inputs to the rest of the nodes.
To power this workflow, the following variables are needed:
- website: The website of the company you're targeting
- hubspotCompanyID: The ID of the company in your CRM
Step 2 - Google search#
Perform a Google search on the company name with the Serper node
Provide the company name and relevant keywords as input to the Serper node to perform a Google search for news articles related to the company.
If you find that the first results of this action are not accurate, try configuring the query with booleans to look for specific keywords such as "[company name] AND home page" to find the most relevant links. See this guide on how to use these operators inside google searches.
Step 3 - Scrape webpages#
Scrape web pages for relevant information
The Serper node produces an array of web pages related to our search query. You can either loop through all links to find the most relevant one or simply select the first. Since Google searches often list the company’s homepage first, we’ll choose the latter for this example.
Add an HTTP browse node to the first link from the Serper action to retrieve the HTML content of the homepage. This content will be used for analysis later in the workflow.
Step 4 - AI analyis#
Analyze HTML content with AI
Next, use one or more OpenAI nodes to parse the HTML content obtained earlier to answer specific questions. This step extracts structured information from unstructured web data. To ensure high-quality output, strictly define the analysis rules in the system prompt of the OpenAI integration, setting clear guardrails for the instructions.
For instance, for a prompt asking to extract any email or phone numbers present on the web page, we have seen the following work well:
Return strictly the e-mail address and phone number from in the content below. If there are multiple mail addresses, pick the one that is closest to a general company e-mail address, or that of the sales team. e.g. sales@, orders@, info@, contact@
You could use a similar system prompt to that below:
Strictly return one phone number Make sure there a no spaces in your response If you don't find anything, just return n/a.
Use the response format to structure the schema of an acceptable output, either a string or a JSON object format.
Step 5 - Map information#
Create arrays for CRM mapping and further enrichment
Once the OpenAI analysis is complete, push the gathered information back into your CRM or Cargo data model. Use a variables node to transform the format of the OpenAI node’s output, if needed.
The outputs of the variables node can then be mapped to your CRM or used for further enrichment processes, ensuring the data is organized and ready for integration. Since web content can change over time, consider adding a timestamp field to trigger this workflow at set intervals.
By following these steps, you can efficiently gather and use public information to enrich your company data, enhancing your outreach and engagement efforts.