Forum Discussion
jamesson_kaupanger
Sep 09, 2019Copper Contributor
Looking for trends in unrelated related data
I don't know if this is in the exact right spot, but here goes. Let's say that I'm a biologist analyzing litters of puppies: I've got six litters I want to analyze. Each of those litters co...
- Sep 13, 2019
What constitutes "wrong" fur color?
In your sample above, the Fur color is a numerical value. If you want to mark this as either right or wrong, then that is already one piece of analysis that you need to undertake in a separate step.
For example, you could create a column in the data that shows "fur color status", for which the value can be wrong or right or whatever. To arrive at the value, you may want to employ a formula that evaluates some of the other properties of that puppy. Maybe skin colors below 10 should have fur colors over 100 and if the fur color is not over 100, it is classified as wrong. Or something like that. An IF function should do the trick. With that function in place, you can then classify the data by the fur color status.
- Do Puppies from Litter 2 have a higher-than-average rate of wrong fur color?
Create a pivot table that shows the average rate for each fur color status in the columns, the litter numbers in the rows.
fur color status Litter correct wrong 1 80% 20% 2 75% 25% 3 95% 5% 4 19% 89% overall 67% 35% - Do all Puppy 4s have a higher-than-average rate of wrong fur color?
use a similar table, but with puppy numbers in the rows.
- Do all Site 21s have a higher-than-average rate of wrong fur color?
use a similar table, but with site numbers in the rows.
- If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their fur length is wrong?
- If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their eye color is wrong?
The first three bullets were simple calculations. These last two are about probability and go deeper into the realm of statistical analysis. Again, you will first need to build the structures (helper columns) required to classify by fur length status and eye color status.
If you load the data into the Power Pivot data model, you can add the helper columns there. You can then also use the powerful statistical DAX functions to create measures and then surface the results in pivot tables.
I'm not sure whether your question is more about understanding how Excel works, or what math to use to calculate a value with several variables, or if you need help with the whole concept of statistical analysis.
Neither of these can be explained in a single forum question, because they each have their own learning curves.
Sep 09, 2019
Hello jamesson_kaupanger ,
Power Query will help if you need to clean the data or merge different data sources into one.
For analysis you may want to use Power Pivot. Load the data into the Power Pivot Data Model. Then you can create all kinds of measures for all the different qualities and properties. Total (sum), count and average are just a fraction of what Power Pivot can calculate. With the Power Pivot measures, you can then build pivot tables and charts to visualise the information.
Have a look for Power Pivot articles and tutorials. It was made for the stuff you describe.
jamesson_kaupanger
Sep 10, 2019Copper Contributor
Thanks for responding.
I'm not looking to clean the data up; I trust the results themselves, and I'm trying to analyze them as they are.
I was thinking that Pivot Tables (which I'm assuming is related to Power Pivot, right?) might be the best way to go; it's just obnoxious that I can't seem to make the formatting sticky enough.
- Sep 10, 2019
jamesson_kaupanger Pivot tables existed long before Power Pivot came along. Power Pivot has a new set of functions that can be used to analyze data and it goes way beyond what can be done with traditional pivot tables alone. You will still build pivot tables off the Power Pivot data model, but the measures that can be created with Power Pivot allow for a lot more and a lot differentiated analysis.
Load the data into the PowerPivot data model and create helper columns to help tag and classify data. Then these helper columns can be used in either measures or in filters and slicers of the pivot table you create from the data model.
Power Query can help you consolidate different data sources and shape data to the perfect form before it is loaded into the data model. Power Query on its own is not an analysis tool.
- jamesson_kaupangerSep 11, 2019Copper Contributor
Or perhaps this is a better way to talk about this:
Litter Puppy Site Type of Test Test Value 5 Unique values, not repeated Between five and ten values, unique to one litter but Puppy 1 from litter 1 should be the "same" as Puppy 1 from litter 2 ~30 unique Sites, again, each unique to one Puppy, but each Puppy uses the same sites 4 different test types Each type of test either has one or two tests associated with it The measured value from a given test of a given test type on a specific site of a particular puppy from a particular lot
Each value has an upper and a lower acceptable limit
Clear as mud?
- Sep 12, 2019
jamesson_kaupanger It looks like the tables all have the same structure. In order to analyze across these tables, use Power Query to Append the queries into one. Load this into the Data Model.
I don't understand how puppy 2 can be the same for different litters. Biologically that is not possible, but if you use this as a metaphor only, sure, use the puppy number/code/name as the unique ID.
You can now build pivot tables and use slicers and/or filters to look at all qualities/properties across all tables.
If you need help building these pivot tables, I would need to know what questions the pivot tables/charts should answer.
- jamesson_kaupangerSep 11, 2019Copper Contributor
Okay, then maybe you can help me set up my data; I've been trying to go through the tutorial, and while I think I understand better, I'm still not certain how best to set up my data to analyze it properly.
My data looks roughly like this (assuming that all of the associated data are numbers):
Table 1:
Litter Puppy Site Fur color Fur pattern Skin color Skin pattern 1 1 2 6 7 3 6 1 1 4 24 2345 3 32 1 2 2 76 234 34 0 1 2 4 12 45 23 23 Table 2:
Litter Puppy Site Fur color Fur pattern Skin color Skin pattern 2 1 2 17 234 0 2 2 1 4 57 5 45 12 2 2 2 67 48 99 10 2 2 4 2 5 3 3 Again, this is somewhat simplified: I've got five litters, each of which has between 5 and 10 puppies, etc.
Any suggestions?