Forum Discussion

Copper Contributor

Sep 09, 2019

Solved

Looking for trends in unrelated related data

I don't know if this is in the exact right spot, but here goes. Let's say that I'm a biologist analyzing litters of puppies: I've got six litters I want to analyze. Each of those litters co...

Data Insights

excel

Excel 2016

IngeborgHawighorst
Sep 13, 2019
jamesson_kaupanger

What constitutes "wrong" fur color?

In your sample above, the Fur color is a numerical value. If you want to mark this as either right or wrong, then that is already one piece of analysis that you need to undertake in a separate step.

For example, you could create a column in the data that shows "fur color status", for which the value can be wrong or right or whatever. To arrive at the value, you may want to employ a formula that evaluates some of the other properties of that puppy. Maybe skin colors below 10 should have fur colors over 100 and if the fur color is not over 100, it is classified as wrong. Or something like that. An IF function should do the trick. With that function in place, you can then classify the data by the fur color status.

Do Puppies from Litter 2 have a higher-than-average rate of wrong fur color?

Create a pivot table that shows the average rate for each fur color status in the columns, the litter numbers in the rows.

fur color status

Litter correct wrong

1 80% 20%

2 75% 25%

3 95% 5%

4 19% 89%

overall 67% 35%

Do all Puppy 4s have a higher-than-average rate of wrong fur color?

use a similar table, but with puppy numbers in the rows.

Do all Site 21s have a higher-than-average rate of wrong fur color?

use a similar table, but with site numbers in the rows.

If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their fur length is wrong?

If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their eye color is wrong?

The first three bullets were simple calculations. These last two are about probability and go deeper into the realm of statistical analysis. Again, you will first need to build the structures (helper columns) required to classify by fur length status and eye color status.

If you load the data into the Power Pivot data model, you can add the helper columns there. You can then also use the powerful statistical DAX functions to create measures and then surface the results in pivot tables.

I'm not sure whether your question is more about understanding how Excel works, or what math to use to calculate a value with several variables, or if you need help with the whole concept of statistical analysis.

Neither of these can be explained in a single forum question, because they each have their own learning curves.

jamesson_kaupanger

Copper Contributor

Sep 12, 2019

IngeborgHawighorst

For instance:

Let's say Litter 2 Puppy 4's Site 21 fur color is wrong.

Do Puppies from Litter 2 have a higher-than-average rate of wrong fur color?
Do all Puppy 4s have a higher-than-average rate of wrong fur color?
Do all Site 21s have a higher-than-average rate of wrong fur color?
If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their fur length is wrong?
If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their eye color is wrong?

IngeborgHawighorst

MVP

Sep 13, 2019

jamesson_kaupanger

What constitutes "wrong" fur color?

In your sample above, the Fur color is a numerical value. If you want to mark this as either right or wrong, then that is already one piece of analysis that you need to undertake in a separate step.

For example, you could create a column in the data that shows "fur color status", for which the value can be wrong or right or whatever. To arrive at the value, you may want to employ a formula that evaluates some of the other properties of that puppy. Maybe skin colors below 10 should have fur colors over 100 and if the fur color is not over 100, it is classified as wrong. Or something like that. An IF function should do the trick. With that function in place, you can then classify the data by the fur color status.

Do Puppies from Litter 2 have a higher-than-average rate of wrong fur color?

Create a pivot table that shows the average rate for each fur color status in the columns, the litter numbers in the rows.

	fur color status
Litter	correct	wrong
1	80%	20%
2	75%	25%
3	95%	5%
4	19%	89%
overall	67%	35%

Do all Puppy 4s have a higher-than-average rate of wrong fur color?

use a similar table, but with puppy numbers in the rows.

Do all Site 21s have a higher-than-average rate of wrong fur color?

use a similar table, but with site numbers in the rows.

If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their fur length is wrong?
If a Puppy's fur color is wrong, does that mean there's a greater-than-average likelihood that their eye color is wrong?

The first three bullets were simple calculations. These last two are about probability and go deeper into the realm of statistical analysis. Again, you will first need to build the structures (helper columns) required to classify by fur length status and eye color status.

If you load the data into the Power Pivot data model, you can add the helper columns there. You can then also use the powerful statistical DAX functions to create measures and then surface the results in pivot tables.

I'm not sure whether your question is more about understanding how Excel works, or what math to use to calculate a value with several variables, or if you need help with the whole concept of statistical analysis.

Neither of these can be explained in a single forum question, because they each have their own learning curves.

SergeiBaklan
MVP
Sep 20, 2019
jamesson_kaupanger

You may try to play with conditional formatting, but I have no practical experience in applying it to Pivot Tables. Try to google for samples.
jamesson_kaupanger
Copper Contributor
Sep 20, 2019
SergeiBaklan

Correct; I wasn't trying to create a many-to-many relationship: there were blank rows in one of the tables which was causing Excel to think that there were duplicate values.

Is there a way to put a horizontal line (preferably two) on a Pivot Table at a specific point?
SergeiBaklan
MVP
Sep 19, 2019
jamesson_kaupanger

These are two different issues. First you clean the tables from empty rows/columns, the rest is next.

Many-to-many relationships are not supported directly in Excel data model, but that could be workarounds, depends on how your data is structured. Sorry, I didn't check entire this thread, too many information - only answered on concrete post. Perhaps IngeborgHawighorst could give more concrete advice.
jamesson_kaupanger
Copper Contributor
Sep 19, 2019
SergeiBaklan

I could filter them but I couldn't create relationships between tables because there were "duplicate" values.

I've gotten rid of them.

Is there a way to mark on a Pivot Table lines representing minimum or maximum quantities?
SergeiBaklan
MVP
Sep 18, 2019
jamesson_kaupanger

They are empty rows/columns, but they are not clean ones. Most probably if you click Ctrl+End you'll be 5 rows down from your data and 8 columns to the right from it.

You may try to select these empty regions, on ribbon Home->Clear All. Save file, open again and check Ctrl+End if helped.

At the same time you may always filter nulls in Power Query.
jamesson_kaupanger
Copper Contributor
Sep 18, 2019
IngeborgHawighorst

Okay, this is incredibly aggravating: do you have any idea why the Table Import Wizard is importing nearly 500 empty rows, even though those rows don't exist in the original table?

Edit: and 8 extra columns, mind you.
jamesson_kaupanger
Copper Contributor
Sep 13, 2019
IngeborgHawighorst

"Wrong" in the sense that it's an imperfect metaphor; each test has an associated range of acceptable values (6 to 10, -0.61 to -0.21, etc.), and if the tested value lies outside of that range, it's "wrong", or unacceptable, or out of spec.

I'd like to think that the main source of my difficulty, at this point, is in using Excel, given that I've only ever used it to represent and/or calculate data rather than derive conclusions from said data. I say, "...like to think..." because I'm not sure; it may be some combination of all three.

In any case, you've been fabulously helpful. I'll go tweak the data and see what I can come up with. Thank you!

Forum Discussion

Looking for trends in unrelated related data

Share

Resources