4.14: Study Questions - Data and Databases
- Page ID
- 22732
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Study Questions
What is the difference between data, information, and knowledge?
- Answer
-
Data are raw facts and figures. Information is data with context and organization, answering who, what, when, where. Knowledge is the application of information and experience to gain insight for action and decision-making.
Explain in your own words the difference between hardware and software components of information systems in your own words.
- Answer
-
Hardware is the physical components of an information system like computers, networks, and devices. Software is the programs and instructions running on the hardware, including operating systems, applications, utilities, etc.
What is the difference between quantitative data and qualitative data? In what situations could the number 63 be considered qualitative data?
- Answer
-
Quantitative data are numeric values from measurement or calculation. Qualitative data are descriptive qualities that cannot be measured numerically. The number 63 could be qualitative if it represents a code or identifier, not a measured quantity.
What are the characteristics of a relational database?
- Answer
-
Characteristics of a relational database include organized into tables with rows and columns; each row has a unique key, columns have field definitions, tables are related via keys linking rows.
When would using a personal DBMS make sense?
- Answer
-
A personal DBMS makes sense for small data management needs for one person or a small workgroup, not enterprise-wide sharing.
What is the difference between a spreadsheet and a database? List three differences between them.
- Answer
-
A spreadsheet is a simple data organization tool optimized for basic data manipulation by an individual. A database is designed for complex querying, large data volumes, multi-user access, and ensuring data integrity.
Describe what the term normalization means.
- Answer
-
Normalization is designing a database to reduce duplication and redundancy and improve flexibility to change. It involves separating data into multiple tables and establishing relationships.
What is Big Data?
- Answer
-
Big data refers to large, complex datasets with high volume, velocity, and variety that require advanced storage, processing, and analyzing techniques.
Name a database you interact with frequently. What would some of the field names be?
- Answer
-
A database could be something like contacts on your phone - fields would include name, phone number, email, birthdate, etc.
Describe the benefits and what open-source data is.
- Answer
-
Open source data is freely shared data that users can access and modify as needed since the source code is publicly available. Benefits include flexibility, transparency, and cost.
Name three advantages of using a data warehouse.
- Answer
-
Data warehouse advantages: Integrates data from disparate sources, provides enterprise-wide analytics, and creates historical records of data over time.
What is data mining?
- Answer
-
Data mining is analyzing large datasets to discover patterns and automatically extract meaningful information and insights.
What is metadata? Provide one example.
- Answer
-
Metadata is "data about data" - it provides information and context about the actual data in a database. An example is a database column's data type (e.g. text, number, date, etc.), which describes the type of data contained in that column.
What are some potential ethical concerns when utilizing personal data for business analytics and data mining?
- Answer
-
Some ethical concerns include:
- Lack of consent from individuals about data collection and use
- Violating privacy expectations by combining various data sources
- Misuse of data profiling to discriminate against groups of people
- Drawing faulty conclusions from incomplete data or incorrect analytics
- Data breaches that expose personal information
- Manipulating consumers through personalized advertising
- Perpetuating existing biases by using biased or unrepresentative data
- Quantitative data are numeric values from measurement or calculation.
Exercises
- Review the design of the Student Clubs database earlier in this chapter. Reviewing the lists of data types given, what data types would you assign to each of the fields in each of the tables? What lengths would you assign to the text fields?
- Review structured and unstructured data and list five reasons to use each.
- Using Microsoft Access, download the database file of comprehensive baseball statistics from the website
- SeanLahman.com. (If you don’t have Microsoft Access, you can download an abridged version of the file here that is compatible with Apache Open Office). Review the structure of the tables included in the database. Identify three different data-mining experiments you would like to try, and explain which fields in which tables would have to be analyzed.
- Do some original research and find two examples of data mining. Summarize each example and then write about what the two examples have in common.
- Conduct some independent research on the process of business intelligence. Using at least two scholarly or practitioner sources, write a two-page paper giving examples of how business intelligence is used.
- Use the Internet to research two software tools or technologies focused on managing knowledge. One tool should be for personal use, and one for business/organizational use. Write a one-page comparison explaining what each tool does, key features, who would use it, and how it facilitates knowledge management. Reflect on how the personal and organizational tools differ in their knowledge management approaches.
-
Install a simple personal DBMS like OpenOffice Base and design a basic database to store information on books, music, or movies. Focus on key fields and data types.
-
Search public data repositories and find 3 datasets of interest. Explain how basic data analysis could provide insights.
-
Interview 2 peers about databases they use and how data is valuable to them. Summarize key learnings in a paragraph.
-
Research a widely used database platform. Explain the key capabilities and a simple use case in 2-3 sentences.
-
Draw an entity relationship diagram for a basic scenario like university enrollment with 5-6 entities.
-
Find an article discussing an ethics case about personal data. Summarize the key issue and perspective in a paragraph.
-
Take a spreadsheet containing duplicated student data and restructure it into a simple, normalized database format.
-
Import a public dataset into a spreadsheet. Make basic charts and graphs to visualize and summarize top-level trends.