Flashback: Optimizing the Textio Indexing Service: A Data-Driven Approach to Improving User Engagement

Jan 13

An image of the Textio Index Service UI circa 2017.

When Textio, a Seattle-based AI-powered company, launched its Textio Indexing Service (TIS), it provided businesses with a unique tool to analyze the language used in their job listings. By comparing job descriptions with competitors, TIS helped companies understand how they could attract more diverse talent by optimizing their job postings. However, as with many early-stage tools, the initial version of TIS faced challenges — particularly with outdated data and low-quality search results, leading to suboptimal user engagement.

As a Software Engineer at Textio, I was tasked with improving TIS to make it more effective and impactful for both our prospects and sales teams. In this post, I’ll walk through how I used data analysis, hypothesis-driven development, and software engineering to optimize this service and significantly enhance its value to users.

The Problem: Outdated Data and Poor Search Results

The Textio Indexing Service was designed to attract prospects by offering insights into the effectiveness of their job postings compared to competitors. However, the service had several issues:

Outdated Data: The data used by the service was often old and not aligned with the latest trends in job posting performance.
Clunky User Experience: The search functionality didn’t always return useful or relevant results, frustrating users and diminishing the service's value.
Competitor Identification: Automatically generated competitor lists were sometimes inaccurate, making it difficult for users to compare their postings with the most relevant industry peers.

This led to lower user engagement and missed opportunities for both our users and the sales team.

The Goal: Optimize TIS for Better Engagement and Accuracy

My goal was clear: optimize the Textio Indexing Service to improve the quality of the data and search results, enhance the user experience, and ultimately drive higher engagement with the tool. Specifically, I aimed to:

Improve the data recency of the service.
Refine the search functionality for more accurate results.
Enhance the way competitors were identified for comparison.

These changes would directly impact how prospects interacted with the service, ultimately increasing the number of report downloads and, more importantly, generating higher-quality leads for the sales team.

Step 1: Analyzing Data to Understand User Behavior

To get a clearer picture of what was happening, I started by analyzing clickstream data stored in AWS S3. Clickstream data tracks user activity, showing how they navigated through the site before arriving at the TIS landing page, which pages they interacted with, and what actions they took afterward.

I used SQL to query this data, joining it with user account data and graph database information. This allowed me to understand the user flow leading up to searches and identify where users might have been dropping off. For example, I wanted to know:

What pages did users visit before landing on the TIS landing page?
How many actions did they perform before initiating a search?
How relevant and accurate were the search results they received before downloading a report?

By analyzing this data, I was able to identify key issues such as:

Low-quality search results: Many users were leaving the page without downloading reports because the search results didn’t meet their expectations.
Outdated data: Some users were interacting with data that was no longer relevant, leading to frustration and reduced trust in the service.
Poor competitor identification: Competitors were sometimes incorrectly identified, making comparisons less useful.

Step 2: Hypothesis-Driven Development to Identify the Root Causes

Based on my analysis, I formed a hypothesis that the search result quality and data recency were key factors affecting the service’s performance. Additionally, I identified that the logic behind competitor identification could be improved.

I proposed several key changes:

Data Recency: Ensure that newly added data is automatically reflected in the service so users always have access to the most up-to-date job posting data.
Improved Search Logic: Refine the algorithm to ensure that search results are more relevant and actionable.
Better Competitor Comparison: Improve how competitors were identified, making sure they were directly relevant to the user’s company and industry.

Step 3: Building the Solution

I then set about building a solution for these issues, leveraging AWS Lambda and several other technologies.

Automating Data Updates Using AWS Lambda

The first problem I tackled was ensuring data recency. I built a microservice using AWS Lambda that automatically updated the data used by TIS whenever new data was dropped into S3. This eliminated the need for manual updates and ensured that users always had access to the latest data. To make this work, I wrote several complex SQL queries to retrieve relevant data, such as job posting language and associated metrics, and then ensure that it was processed correctly for use in TIS.

Optimizing Competitor Identification with TF-IDF and Cosine Similarity

Next, I turned my attention to improving the competitor identification process. I experimented with TF-IDF (Term Frequency-Inverse Document Frequency) and Cosine Similarity, which are widely used methods for measuring text similarity. By applying these techniques, I was able to improve the way competitors were identified and ensure that users saw more relevant comparisons. This approach allowed us to automatically define competitors based on job posting language and context, making the comparisons more valuable for users.

Improving Search Functionality: Implementing Fuzzy Matching and Caching for Speed

Another critical issue I identified was that the search functionality was not performing as expected. The algorithm used to return search results was slow and often failed to suggest relevant search completions as users typed, leading to a poor user experience.

To address this, I re-implemented the Maria Tri algorithm, which is optimized for efficiently handling partial matches and improving real-time search suggestions. Additionally, I integrated a fuzzy matching algorithm, which allowed the system to better account for variations in user input and return more accurate results, even when the search terms weren’t an exact match to the indexed data.

To further speed up search performance, I implemented caching both in-memory and on-disk. This significantly reduced latency by storing frequently accessed search data closer to the application, making the search results available much faster. Caching helped ensure that repeated searches or common search queries could be retrieved quickly, eliminating the need to reprocess the same data each time a user interacted with the search feature.

Alongside these technical improvements, I also revamped how we captured and tracked search interactions. I enhanced the tracking of each search term entered, as well as the options that users clicked through, creating a more dynamic and responsive search index. This allowed the system to continuously refine search results based on real-time user behavior.

Step 4: Collaborating with Marketing and Sales Teams

In parallel with the technical work, I collaborated closely with the marketing and sales teams to align the changes with customer-facing messaging. It was crucial that the new features and improvements matched the marketing language used to pitch the service, ensuring a consistent and compelling experience for prospects.

We prioritized data recency as the highest-priority feature, as it was the most critical to improving user engagement. I worked with marketing to ensure the new updates were clearly communicated in our promotional materials, making prospects aware that the service now offered more accurate, up-to-date information.

Feature Flag for A/B Testing and Segmentation

To test these changes, I released the new features behind a feature flag. This allowed us to segment users based on company size and industry, and test the new functionality on a subset of users. We could measure the impact of the changes before fully rolling them out.

The Impact: Increased Engagement and Improved Lead Quality

The impact of these changes was significant:

45% increase in report downloads: Users were much more likely to download reports after interacting with the updated service.
15% increase in meeting requests: The improvements to search accuracy and competitor identification led to more prospects requesting meetings with our sales team.
Improved NPS (Net Promoter Score): The new feature that allowed users to submit their own list of competitors contributed to a higher NPS, indicating better customer satisfaction and engagement with the service.

Conclusion

Optimizing the Textio Indexing Service was a great opportunity to apply data analysis, hypothesis-driven development, and software engineering to solve real-world business challenges. By improving the quality of the data, enhancing search functionality, and refining competitor identification, I was able to significantly boost user engagement and support the sales team in closing more deals.

This project reinforced the importance of using data to drive product improvements and how collaboration between engineering, marketing, and sales can lead to better outcomes for both users and the business.

You can read more about the Textio Index here: https://www.geekwire.com/2016/textio-makes-job-post-data-public-companies-can-find-stack-competition/

Margie Henry