Well, I officially have some data now, by which I mean I have an Excel spreadsheet full of numbers that I can plug into Stata (the statistical software we use) and run regressions and things on. I've only entered data for 30 borrowers (out of nearly 500) so far, but since it's panel data, that's actually almost 400 observations (one for each year for each borrower), so it's enough to get started with. Also, there may still be some major changes to how we construct certain variables and what we do or do not include, so I'm not planning on entering all the data until winter break, during which time I'll be staying in SF (well, except for going to Bako for X-mas) because I can't afford to do anything fun. Which is okay, because I could use a mellow couple of weeks in the city, doing semi-mindless data entry and doing the things that I never get around to, like going to museums and Golden Gate park, etc.
So, tonight I spent some time running descriptive statistics on my variables (mean, standard deviation, coefficient of skewness, etc), and then I ran a few very preliminary regressions. If you all understood econometrics, I could tell you some things that you'd think were hilarious in a depressing, pathetic sort of way. Like the fact that I ran a tobit regression and it couldn't even find a single set of coefficients that would maximize the density function. So then I transformed my dependent variable from a multinomial to a binomial, and ran a logit regression, but none of my variables were even remotely significant. If a variable is significant at the 10% level (meaning there's a 90% chance that what it's telling you about the correlation between it and the dependent variable is true), then your "p-value" would be 0.1. When you get insignificant results, it's usually a p-value of like 0.5 or 0.6 at most (occasionally higher). I actually had several p-values of 1.0 (i.e. 100% chance that this variable is totally insignificant). I didn't even know that was possible (Dr. J's comment: "Wow. Yeah, that's possible, but it's pretty hard to do."). So yeah, I'm going to have to come up with a bit more complicated of a model in order to get anything meaningful. And things might improve as my sample gets larger. But my main problem is that since not many borrowers acquired durable business assets, my dependent variable has a whole bunch of zero values, which messes things up.
But at least I can set that aside for a day or two and focus on other things. Tonight I'm just going to read for a while and try to get to bed "early", so that I can get up tomorrow and do laundry and other productive things.
Subscribe to:
Post Comments (Atom)
2 comments:
Well, keep adding data and see what happens. And if you don't get any trends at all... just remember "Lack of Evidence is not Evidence of Lack" whatever that means... the lab folks over here are always quoting it when their experiements don't give the desired results.
Yeah, my shrink told me that I seemed depressed yesterday, and I said I was frustrated because my data sucks (and this was even before the p-values of 1.0, which were so bad that they actually ended up amusing me), and he said all the same things to me. "When we did research in med school, blah blah blah..." i know that insignificant results are still significant, so if no one bought durable business assets, that's fine. but the other problem is the quality of the data. there's another saying in research: "garbage in, garbage out." and i'm starting to feel like my data is garbage (which is somewhat true and somewhat my paranoia).
the upshot (or downside, depending on how you look at it) is that i might end up getting to use a fairly novel approach, because i have lower-limit censored data (i.e. lots of zeros), but rather than the non-zero values being a continuous variable, they are a discrete ordered variable. normally you'd do a tobit or a heckman model and put a standard OLS regression inside of it, but i might have to do a heckman with a multinomial logit inside of it, which dr. j says he's never seen before.
Post a Comment