PAWSM tools are an attempt at bringing our ideas to the people before we introduce them into our PAWSM dog diet and nutrition mobile app. Each tool is supposed to be simple and lightweight, focused only on the task at hand.
What happens when you run out of your dog’s food? You usually try to buy the same dog food. But what if you cannot find it? This is when Science Diet Dog Food – PAWSM search for food by similarity (shortly named PAWSM Food Similarity Tool) comes to the rescue.
PAWSM developed PAWSM dog diet and nutrition mobile app. The main purpose of our mobile app is optimal dog nutrition based on their activity and selected dog food.
What is food similarity?
For our PAWSM dog diet and nutrition mobile app, we had to build a database of the ingredients contained in each dog food. Because every single dog food producer lists the data differently, we couldn’t use a web scraper so the only way to do it was to take each dog food and manually input the data.
Since that took a lot of time and effort, we decided to make something good out of it and made the database searchable. We added additional features such as range search, where you can look for certain ratios of ingredients. That helps you find dog food for special diets that might otherwise prove to be quite difficult to find. The database is being updated regularly (you can even add new non-listed products yourself).
To find a dog food that is similar to the one you searched for, we compare it to the other dog foods in our database. In the following text, we will try to explain the methodology and use case.
For a better understanding of our analysis, we have to take a look at the dog food data we collected.
The data can be roughly grouped into three distinct groups; general dog food information, elementary dog food information, and vitamin dog food information.
General dog food information contains:
- dog food name,
- is the dog food AAFCO compliant,
- state of the dog food (dry, wet, raw & treat),
- caloric value and
- nutritional analysis (fat, protein, fiber, moisture, …).
The caloric value and nutritional analysis are numerical and they can be used for the search by similarity.
The elementary dog food information contains information about the foods elementary composition while dog food vitamin information contains the data of vitamins such as:
- Vitamin A, C, E, K
- Vitamin B1, B2, B3, B5, B6, B9 & B12.
This data is necessary for our nutrition calculator for optimal feeding in our mobile app. The result shows you the best dog diet, by offering the optimal amount of food your dog needs per meal in a day.
Not all of the mentioned dog food information is necessary for our similarity search, but can be and is useful in our PAWSM dog diet and nutrition mobile app. Our analysis uses:
- food name,
- state of the food
- the caloric value of the food and
- guaranteed analysis of the food.
Now comes the fun part of our blog post – statistics :). As you can imagine, the gathered data can be overwhelming.
The first part is data normalization. It’s a fancy word for bringing all of our numerical data to its common denominator. We have to do this because our data contains different scales (kcal/kg, %, g, …) and we do it with feature scaling.
With all that, our data is nice and clean. The only problem we can have is missing data. This step is a bit of a trial & error; basically, we must decide if we remove the food with missing data or replace the missing data with an arbitrary value (that fits the context; e.g. average value).
This is also the part where you can see the quality of the dog food data we collected. It must be stated that the data collected varies from manufacturer to manufacturer, with some manufacturers omitting important data such as the moisture content of the food.
Because of the number of dog food attributes, we cannot group the food by hand. The method we use is k-means clustering that aims to partition n observations into k clusters.
If you remember, there is some kind of clustering already within our dog food data. That is the state of the food, which can be dry, wet, raw, and treats. In our PAWSM dog diet and nutrition mobile app, we even go so far, that we classify our “food” as all of the food with dry, wet, or raw state of food, and the others are “treats”.
We apply the K-means algorithm within each state of food. The result of that is, that our search results always have the same state of food; dry, wet, raw or treat.
The number of clusters is calculated with the Silhouette method. The silhouette value is a measure of how similar an object is to its cluster (cohesion) compared to other clusters (separation).
As you can see on the Silhouette method chart, the optimal number of clusters for dry dog food stored in our database is 9. The process is similar for all other states of food.
The only thing remaining is to transpose the results of the grouping on our database. This results in a partitioning of the data space into Voronoi cells. K-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances.
Now that we explained the reasoning behind the tool, we can see it in action. As stated in the beginning, you can find this tool useful if you are buying your dog food and cannot find it. The result of our Science diet dog food – search is three randomly selected dog foods in the same cluster.
When you are on the tool’s webpage, the first thing you’ll see is a short description. You can enter your dog food name in the input field and we’ll display the results of the search below it. The search is done within our dog food database, which is updated constantly.
Now you can scroll the search results with “Next” and “Previous” buttons and you select the desired dog food by clicking on it. You can always change the search pattern.
The first result displayed is the selected dog food information. If you click on the pulsing arrow, you can see its additional information; protein, fat, and fiber. If you wish to see other dog food information, you can always download our PAWSM dog diet and nutrition mobile app and look up your dog food in it.
The secondary results are similar dog foods. We are displaying 3 randomly selected foods within the same k-means cluster.
If the food you are searching for is not in our database, we provide you with a checkbox, which you can then tick, if you want us to put the chosen food into our database.
Clustering can be and is a powerful tool for managing large quantities of data. And this is only the beginning. Please let us know if you wish to see similar content or tools and we’ll try our best to make it happen.
Changing your dog’s food can be stressful and this is our small contribution to ease this task.
Stay Awesome, use PAWSM.
PS: What dog food do you use? Do you have any dog food ideas that you want to share with the world? Comments are welcome in the comment section below or on our FB page.