Registering Error: Relational Design Question

I'm designing a database for a medical research group in a university
medical center that studies organ transplant patients following surgery.
There is no existing database and all data is currently recorded on a
lengthy paper form. The paper form has hundreds of checkboxes as well as
dozens of free-form data entry fields (to indicate medications, dosages, and
comments). This form is filled out immediately after the initial surgery and
then on each subsequent followup appointment. So this same form is filled
out many times for the same patient over a period of years. For your
"visual" on this paper form, just think about any paper form you have had to
fill out for your medical history for any doctor's appointment you've ever
been to; now multiply the number of fields on that form by by 5 or 6 times
and that's what I'm dealing with.
My question is about the data model and table design to implement:
The unique identifier for patients will be the "Patient ID" assigned by the
hospital from their system of record. Not much debate on that.
One important consideration is all those check boxes. Some mean "true or
false" and can ony mean one or the other; while many check boxes
legitimately can mean "null or true or false". So for each appointment,
there will likely be many NULL values to be dealt with. Also, some of the
check boxes are logically grouped into mutually exclusive options.
Historical records from thousands of paper forms collected over the past
several years will be entered into this db once it's up and running.
My initial opinion is to store the information "vertically" and not
"horizontally"... something like this:
CREATE TABLE [Patients] (
[PatientID] [char] (12) NOT NULL ,
[FirstName] [varchar] (30) NOT NULL ,
[LastName] [varchar] (30) NOT NULL,
-- other columns here...
CONSTRAINT [PK_Patients] PRIMARY KEY CLUSTERED
([PatientID]) ON [PRIMARY]
) ON [PRIMARY]
Here's where all the data collected on the paper form would be stored:
CREATE TABLE [PostTransplantData] (
[PatientID] [char] (12) NOT NULL ,
[ApointmentDateTime] [datetime] NOT NULL ,
[FieldID] [int] NOT NULL ,
[FieldValue] [varchar] (100) NULL ,
CONSTRAINT [PK_PostTransplantData] PRIMARY KEY CLUSTERED
([PatientID],[ApointmentDateTime],[Field
ID]) ON [PRIMARY]
) ON [PRIMARY]
... as opposed to storing it "horizontally" with one column in the
PostTransplantData table per checkbox or other field on the paper form.
Doing that would tie the db design to the paper form... something we don't
want to do for obvious reasons.
NOTE TO CELKO: FieldID in this db is used to refer to the Field on the paper
form, as described elsewhere in the database. AFAIK, Columns in the db are
supposed to map to "real things in the real world" and in our case those
"real things" are FIELDs on a paper form. And no, I'm not trying to
duplicate the paper form in the db (thus the proposed tables above).
Thoughts? Opinions? Suggestions?
Thanks!Jeff wrote:
> I'm designing a database for a medical research group in a university
> medical center that studies organ transplant patients following surgery.
> There is no existing database and all data is currently recorded on a
> lengthy paper form. The paper form has hundreds of checkboxes as well as
> dozens of free-form data entry fields (to indicate medications, dosages, a
nd
> comments). This form is filled out immediately after the initial surgery a
nd
> then on each subsequent followup appointment. So this same form is filled
> out many times for the same patient over a period of years. For your
> "visual" on this paper form, just think about any paper form you have had
to
> fill out for your medical history for any doctor's appointment you've ever
> been to; now multiply the number of fields on that form by by 5 or 6 times
> and that's what I'm dealing with.
> My question is about the data model and table design to implement:
> The unique identifier for patients will be the "Patient ID" assigned by th
e
> hospital from their system of record. Not much debate on that.
> One important consideration is all those check boxes. Some mean "true or
> false" and can ony mean one or the other; while many check boxes
> legitimately can mean "null or true or false". So for each appointment,
> there will likely be many NULL values to be dealt with. Also, some of the
> check boxes are logically grouped into mutually exclusive options.
> Historical records from thousands of paper forms collected over the past
> several years will be entered into this db once it's up and running.
> My initial opinion is to store the information "vertically" and not
> "horizontally"... something like this:
> CREATE TABLE [Patients] (
> [PatientID] [char] (12) NOT NULL ,
> [FirstName] [varchar] (30) NOT NULL ,
> [LastName] [varchar] (30) NOT NULL,
> -- other columns here...
> CONSTRAINT [PK_Patients] PRIMARY KEY CLUSTERED
> ([PatientID]) ON [PRIMARY]
> ) ON [PRIMARY]
>
> Here's where all the data collected on the paper form would be stored:
> CREATE TABLE [PostTransplantData] (
> [PatientID] [char] (12) NOT NULL ,
> [ApointmentDateTime] [datetime] NOT NULL ,
> [FieldID] [int] NOT NULL ,
> [FieldValue] [varchar] (100) NULL ,
> CONSTRAINT [PK_PostTransplantData] PRIMARY KEY CLUSTERED
> ([PatientID],[ApointmentDateTime],[Field
ID]) ON [PRIMARY]
> ) ON [PRIMARY]
> ... as opposed to storing it "horizontally" with one column in the
> PostTransplantData table per checkbox or other field on the paper form.
> Doing that would tie the db design to the paper form... something we don't
> want to do for obvious reasons.
> NOTE TO CELKO: FieldID in this db is used to refer to the Field on the pap
er
> form, as described elsewhere in the database. AFAIK, Columns in the db are
> supposed to map to "real things in the real world" and in our case those
> "real things" are FIELDs on a paper form. And no, I'm not trying to
> duplicate the paper form in the db (thus the proposed tables above).
> Thoughts? Opinions? Suggestions?
> Thanks!
I don't know your business but my first reaction would be that check
boxes on a form are not "real" data they are a means of COLLECTING
data. I'd say you should model the real information represented on the
forms and I suspect your design won't do that well. For example if some
questions are mutually exclusive (like a series of selections
representing age ranges or different possible outcomes) it would
probably be more effective to model them in a single column rather than
multiple columns.
I also don't like the sound of:

> many check boxes
> legitimately can mean "null or true or false"
Even if that is so, that's no reason to use a nullable column to
represent those values in the database. In fact I'd say that if "null"
is a legitimate response then you certainly should NOT represent that
with a null in the database. A null in this case presumably could mean
"unanswered" or "not applicable" or "don't know" - use a proper value
or values to represent that fact.
Hope this helps.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||David Portas wrote:
> For example if some
> questions are mutually exclusive (like a series of selections
> representing age ranges or different possible outcomes) it would
> probably be more effective to model them in a single column rather than
> multiple columns.
>
Given your example my comment doesn't really explain what I meant to
say. I mean that a more effective design would be to represent such an
attribute as a single value in its own column rather than as a set of
values in a column or as a set of columns.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Most of your words make good sense. One row per value, makes sense. Field
= position on paper, ok, but not Field = Questionaire. Also,
AppointmentDateTime probably shouldn't be on the PostTransplantData since
the same value will be true for each. I might also be a bit less likely to
have a table that is so specifically named. You might find that a
PreTransplant review might be needed sometime, and the same format might be
usable.
What I might have is two tables to set up the questionaire (I would use
surrogate keys as my actual keys, but to each their own)
Questionaire
===========
QuesionaireName (pk)
Other information, like description
QuestionaireQuestion <-- you might even break this up into question and
questionaire question to allow questions to be used by multiple
questionaires in the future)
=============
QuestionaireName
Question
pk (QuestionaireName,Question)
Answer Format (checkbox, freeform, etc)
Answer Required
Maybe some form field information if you want to build the visual report
from the information)
Possibly some interpretational information, like for printing
Possibly some ties to information to help the doctor with drug or other
disease interaction
This will give you the data to interpret the answers. Then the other tables:
Patient
=====
PatientId
Name Columns
Etc
Appointment
==========
PatientId
DateTime
AppointmentQuestionaireAnswer
=======================
AppointmentKey (whatever is chosen)
QuestionKey
Answer (probably a variant, but you could just textualize it.
----
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"Arguments are to be avoided: they are always vulgar and often convincing."
(Oscar Wilde)
"Jeff" <A@.B.COM> wrote in message
news:eSLxP$4HGHA.3056@.TK2MSFTNGP09.phx.gbl...
> I'm designing a database for a medical research group in a university
> medical center that studies organ transplant patients following surgery.
> There is no existing database and all data is currently recorded on a
> lengthy paper form. The paper form has hundreds of checkboxes as well as
> dozens of free-form data entry fields (to indicate medications, dosages,
> and comments). This form is filled out immediately after the initial
> surgery and then on each subsequent followup appointment. So this same
> form is filled out many times for the same patient over a period of years.
> For your "visual" on this paper form, just think about any paper form you
> have had to fill out for your medical history for any doctor's appointment
> you've ever been to; now multiply the number of fields on that form by by
> 5 or 6 times and that's what I'm dealing with.
> My question is about the data model and table design to implement:
> The unique identifier for patients will be the "Patient ID" assigned by
> the hospital from their system of record. Not much debate on that.
> One important consideration is all those check boxes. Some mean "true or
> false" and can ony mean one or the other; while many check boxes
> legitimately can mean "null or true or false". So for each appointment,
> there will likely be many NULL values to be dealt with. Also, some of the
> check boxes are logically grouped into mutually exclusive options.
> Historical records from thousands of paper forms collected over the past
> several years will be entered into this db once it's up and running.
> My initial opinion is to store the information "vertically" and not
> "horizontally"... something like this:
> CREATE TABLE [Patients] (
> [PatientID] [char] (12) NOT NULL ,
> [FirstName] [varchar] (30) NOT NULL ,
> [LastName] [varchar] (30) NOT NULL,
> -- other columns here...
> CONSTRAINT [PK_Patients] PRIMARY KEY CLUSTERED
> ([PatientID]) ON [PRIMARY]
> ) ON [PRIMARY]
>
> Here's where all the data collected on the paper form would be stored:
> CREATE TABLE [PostTransplantData] (
> [PatientID] [char] (12) NOT NULL ,
> [ApointmentDateTime] [datetime] NOT NULL ,
> [FieldID] [int] NOT NULL ,
> [FieldValue] [varchar] (100) NULL ,
> CONSTRAINT [PK_PostTransplantData] PRIMARY KEY CLUSTERED
> ([PatientID],[ApointmentDateTime],[Field
ID]) ON [PRIMARY]
> ) ON [PRIMARY]
> ... as opposed to storing it "horizontally" with one column in the
> PostTransplantData table per checkbox or other field on the paper form.
> Doing that would tie the db design to the paper form... something we don't
> want to do for obvious reasons.
> NOTE TO CELKO: FieldID in this db is used to refer to the Field on the
> paper form, as described elsewhere in the database. AFAIK, Columns in the
> db are supposed to map to "real things in the real world" and in our case
> those "real things" are FIELDs on a paper form. And no, I'm not trying to
> duplicate the paper form in the db (thus the proposed tables above).
> Thoughts? Opinions? Suggestions?
> Thanks!
>
>|||RE:
<< ... a more effective design would be to represent such an attribute as a
single value in its own column rather than as a set of values in a column or
as a set of columns >>
So are you saying that basically I should do something like this:
1. Create a bunch of tables that correctly model the "things about the
patient" that are being collected on the paper forms.
2. Then for the things that can be one of {null, true, false} (i.e.,
collected via check boxes on paper form) there is a column "per thing" that
stores just that: NULL, 1 (true), 0 (false).
Yes? More or less?
Thanks for your helpful feedback.
"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:1137963709.394009.142820@.g44g2000cwa.googlegroups.com...
> David Portas wrote:
> Given your example my comment doesn't really explain what I meant to
> say. I mean that a more effective design would be to represent such an
> attribute as a single value in its own column rather than as a set of
> values in a column or as a set of columns.
> --
> David Portas, SQL Server MVP
> Whenever possible please post enough code to reproduce your problem.
> Including CREATE TABLE and INSERT statements usually helps.
> State what version of SQL Server you are using and specify the content
> of any error messages.
> SQL Server Books Online:
> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
> --
>|||RE:
<< A null in this case presumably could mean "unanswered" or "not
applicable" or "don't know" - use a proper value or values to represent that
fact.>>
You are correct.
So to make sure I understand, I would NOT store a NULL value then? I'd store
some [perhaps integer] value that means "unanswered" or "not applicable",
etc?
For some reason I thought NULL was the "catch-all" value to be used to
recommend such things.
"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:1137962776.499876.6020@.g43g2000cwa.googlegroups.com...
> Jeff wrote:
> I don't know your business but my first reaction would be that check
> boxes on a form are not "real" data they are a means of COLLECTING
> data. I'd say you should model the real information represented on the
> forms and I suspect your design won't do that well. For example if some
> questions are mutually exclusive (like a series of selections
> representing age ranges or different possible outcomes) it would
> probably be more effective to model them in a single column rather than
> multiple columns.
> I also don't like the sound of:
>
> Even if that is so, that's no reason to use a nullable column to
> represent those values in the database. In fact I'd say that if "null"
> is a legitimate response then you certainly should NOT represent that
> with a null in the database. A null in this case presumably could mean
> "unanswered" or "not applicable" or "don't know" - use a proper value
> or values to represent that fact.
> Hope this helps.
> --
> David Portas, SQL Server MVP
> Whenever possible please post enough code to reproduce your problem.
> Including CREATE TABLE and INSERT statements usually helps.
> State what version of SQL Server you are using and specify the content
> of any error messages.
> SQL Server Books Online:
> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
> --
>|||Jeff wrote:
> RE:
> << ... a more effective design would be to represent such an attribute as
a
> single value in its own column rather than as a set of values in a column
or
> as a set of columns >>
> So are you saying that basically I should do something like this:
> 1. Create a bunch of tables that correctly model the "things about the
> patient" that are being collected on the paper forms.
> 2. Then for the things that can be one of {null, true, false} (i.e.,
> collected via check boxes on paper form) there is a column "per thing" tha
t
> stores just that: NULL, 1 (true), 0 (false).
> Yes? More or less?
> Thanks for your helpful feedback.
>
1. Yes.
2. I don't use 1 and 0 to represent TRUE and FALSE if I can avoid it.
What is the *fact* that is true or false? That I am male? Or that I'm
over 30 years old? So how about using columns for Gender and Age with
proper codes or values instead of 1 or 0. If I must represent TRUE and
FALSE then I'll use a character code so that I can easily add new
statuses if the need arises. Avoid BIT. The BIT type has so many
peculiarities that it's more trouble that it's worth IMO.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||Jeff wrote:
> RE:
> << A null in this case presumably could mean "unanswered" or "not
> applicable" or "don't know" - use a proper value or values to represent
> that fact.>>
> You are correct.
> So to make sure I understand, I would NOT store a NULL value then? I'd
> store some [perhaps integer] value that means "unanswered" or "not
> applicable", etc?
> For some reason I thought NULL was the "catch-all" value to be used to
> recommend such things.
>
Yes and no. Opinions about NULLs differ - in fact it's a highly
controversial topic. The problem is that NULLs aren't like real values. They
are treated differently in SQL logic. That means the more NULLs you have the
more complex your code ultimately becomes and likely the more mistakes you
will make. For this reason sensible designers use NULLs infrequently and
many avoid them altogether.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||RE:
<< For this reason sensible designers use NULLs infrequently and many avoid
them altogether.>>
Yes, I've been aware that NULLs are both controversial and I've minimized
their use as a matter of course. This project, however, appears like it will
frequently have a *majority* of entries [collected during any given patient
appointment] "not applicable" or "unknown" or "legitimately blank".
So, what would a sensible designer do to represent the values "not
applicable" or "unknown" or "legitimately blank". Would an integer column be
created with values coded to the various meanings? e.g., 0=not applicable,
1=unknown, 2=legitimately blank, and then other values assigned to true,
false, yes, no, etc?
Thanks again for your helpful feedback.
-Jeff
"David Portas" <REMOVE_BEFORE_REPLYING_dportas@.acm.org> wrote in message
news:euX2kK6HGHA.2696@.TK2MSFTNGP14.phx.gbl...
> Jeff wrote:
> Yes and no. Opinions about NULLs differ - in fact it's a highly
> controversial topic. The problem is that NULLs aren't like real values.
> They are treated differently in SQL logic. That means the more NULLs you
> have the more complex your code ultimately becomes and likely the more
> mistakes you will make. For this reason sensible designers use NULLs
> infrequently and many avoid them altogether.
> --
> David Portas, SQL Server MVP
> Whenever possible please post enough code to reproduce your problem.
> Including CREATE TABLE and INSERT statements usually helps.
> State what version of SQL Server you are using and specify the content
> of any error messages.
> SQL Server Books Online:
> http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
> --
>|||Jeff wrote:
> RE:
> << A null in this case presumably could mean "unanswered" or "not
> applicable" or "don't know" - use a proper value or values to represent th
at
> fact.>>
> You are correct.
> So to make sure I understand, I would NOT store a NULL value then? I'd sto
re
> some [perhaps integer] value that means "unanswered" or "not applicable",
> etc?
> For some reason I thought NULL was the "catch-all" value to be used to
> recommend such things.
>
Yes and no. Opinions about NULLs differ - in fact it's a highly
controversial topic. The problem is that NULLs aren't like real values.
They are treated differently in SQL logic. That means the more NULLs
you have the more complex your code ultimately becomes and likely the
more mistakes you will make. For this reason sensible designers use
NULLs infrequently and many avoid them altogether.
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--

Monday, March 26, 2012

Relational Design Question

No comments:

Post a Comment

Registering Error

Blog Archive

About Me