Spam Filtering — How We Protect Your Form Endpoints

Everyday we're filtering.

We receive thousands of submissions every day. We’ve taken a recent sampling of fifty thousand submissions in order to give you a breakdown of how much of that is spam, and how we make that determination. Of the recent 50k submissions, 76% were classified as spam. The spam to ham ratio is actually quite variable based on current events like holidays or elections.

During the 2020 USA elections, we saw astronomical spam numbers. However, the reasons for spam classification generally stay the same. Let’s break down the numbers in the pie chart:
 
Basin spam classification pie chart


How Basin Classifies Form Backend Spam


Your first line of defence is Google reCAPTCHA which begins on your own web page before a form is ever submitted. Based on the chart above it looks like reCAPTCHA only filters 0.5% of spam, but in reality, it’s around double that because half of the submissions never pass the challenge in the first place. The bar chart below is from Google reCAPTCHA that shows around half of the challenges fail on any given day.
Recaptcha success rates


The next best indicator of spam is the honeypot which is a secret hidden HTML input field that when filled out is guaranteed to be a spam bot. This along with allowed domain filtering provides a combined 14% of spam filtering. These spam filtering features are optional but highly recommended.

A lot of spam is actually the same content or variations of the same content. This makes it really easy to quickly determine if something is spam. Do we already have 1000 submissions exactly like this that are all spam? Then no need to continue processing. This method is higher up in our list of checks might account for the weighting on the pie chart. One interpretation of this could be that 18.9% of our form endpoint spam is blatantly the same. 

Next, we check common blocklists for email and IP addresses that belong to known spammers. There are some amazing projects out there like Project Honeypot that help in solving this problem. Blocklist checking has proven to be very reliable leading to almost no false positives. 

We’ve made it pretty far without actually looking at the content of your submission, but we still have yet to account for 47.3% of our spam reasons. This is where things become subjective. We’ve been working hard on our own in-house AI/ML spam classification tool called Junkbox. It's trained on known spam and ham data. Machine learning has been shown to be able to detect small patterns and nuances beyond human capability when it comes to data analysis. A high Junkbox score and the usage of URL shortening in your content are guaranteed spam. 

Basin always provides the reason a specific submission was marked as spam. You can review this by opening the submission within your spam inbox.

Spam submission within Basin inbox with reason displayed at the top.


Why Does Spam filtering Matter

 
  • Security: Basin protects your organization from phishing attacks and fraud by flagging spam when we see it. By providing you with the spam reason, it might give you that extra pause that could save you from a phishing attack.
  • Email deliverability: We need to do our best job to detect spam. This is in order to maintain our email sending reputation which allows us to put emails in inboxes instead of spam folders. Top mail providers like Google and Microsoft will reject mail that contains spam. Additionally, top transactional email providers will refuse to send your mail if it seems to match spam patterns. Our auto-response mailers depend on deliverability in order to create value for you.
     

How to Handle False Positives

 
We know it’s critical your business receives every lead. Here are some techniques to make sure nothing slips through the cracks:
 
  • Make use of the Basin spam inbox and check it often. 
  • Configure webhooks as the trigger for spam submissions. Common configurations for this might be: Basin spam submissions -> Zapier -> Google Sheets/Hubspot.
  • Subscribe to spam summary emails from Basin.

Here at Basin, our support form triggers a Slack message for all submissions inducing spam. If you are already using Slack, this is a great use of our direct integration with them.

Thanks for reading, and happy form building!

Have questions or need help?
Visit our docs or drop us a line at support@usebasin.com.

Try Basin today for free

Setup a simple, no code form backend with just a few clicks.

Create your account