What is database
- databases is a computer file that follows a specific structure and rules in order to allow the input, organization, and most importantly retrieval of data very quickly.
- It does this by organizing data into tables that could be sorted and filtered in very flexible ways.
- So a database is just a structured collection of information.
Database Management System (DBMS)
The DBMS provides three very important tasks. SQL Server is one of them
- Helps us create the structural rules that our data will adhere to.
- This helps keep the data organized and consistent and provides predictable results when it comes time to retrieve the information that we previously stored.
- Help load data into the framework we've already established.
- Things like writing data to the tables and ensuring that it conforms to the established rules as well as helping users sort through the data to find trends or produce reports.
Provides additional support for the database such as tracking users and providing log-in credentials, performing maintenance and routine backups, as well as a host of additional administrative tasks that protect and secure your data.
Databases work much the same way and by providing multiple indexes to the data, you could dramatically speed up searches.
Providing a very robust way to ask questions of the data. This is a process called querying.
- Queries allow you to sort, filter, organize, and summarize your data in nearly endless ways. And the best part is, is that you don't need to know how you will want to view the data when you begin storing it.
Flat File
- A flat file database is a simple two-dimensional table structure that's made up of rows and columns
- Typically they are saved as simple plain text files, and use distinct separation characters called delimiters to define where each column starts and stops.
- Each row in a flat file table represents a single entity of some group of things that you want to keep track of. These are referred to in database terms as a record.
- Each column, sometimes called fields in database terminology, in the table represents different attributes, or distinct pieces of information about that person, place, thing, or event.
- The columns are usually broken down into the smallest piece possible. This is through a process called normalization, which we'll talk about in depth later on in this series.
- By breaking down the address into its separate components, you gain a lot of flexibility in searching and sorting your database.
Hierarchical Databases
- Hierarchical databases take everything that we just learned about flat file databases, and adds a simple parent child relationship component.
- In a hierarchical database, you would link several tables together by a common thread or piece of information. Each parent table can have several child tables, but each child table can only have a single parent.
- Hierarchical databases can help illustrate where a particular piece fits into the entire system, and what comes above and below in the chain.
- Where hierarchical databases break down is when they need to model more complex relationships than just above and below.
- Anything that can't be illustrated with a branching tree diagram probably wouldn't be a good fit for a hierarchical database design.
Relational Databases
The relational database builds on the organizational principles of the flat file system and the connected nature of the hierarchical system, but adds the ability to connect multiple tables together without restriction on the number of parent and child relationships.
- This helps you get a more thorough picture of the system and more accurately capture the complexities of your data connections.
The main idea behind a relational database is that your data gets broken down into common themes, with one table dedicated to describing the records of each theme.
By using unique identifiers for each record, we can relate one table to another.
- These identifiers are called key fields and they are the glue that holds the entire system together.
- Each one of these represents key values that help keep your records separate in various database systems that are used every day throughout the world.
- The advantage to breaking down your data into separate tables is that you can start to combine your data back together again in lots of different configurations.
Components of Relational Databases
Data types
- By specifying the format that a valid column eventually will adhere to, we help make sure that the information is consistently entered.
Constraints
- We can tell the database that a particular column won't allow any duplicate values. Referential integrity
- Another rule that will help us ensure the validity and completeness of your database.
- Referential integrity means that if the database is expecting a relationship on a particular field, then the corresponding value must already exist in the parent table before it will allow a change to the child.
- Referential integrity will protect you from these types of phantom connections by checking the existence of the item you're referring to.
Structured Query Language
SQL is the standard language that relational databases use in order to create the data structures, enter and update data, and write queries to ask questions of the data set.
- Though the core SQL language is part of the American National Standards Institute or ANSI Standard, each DBFS vendor applies their own tweaks and enhancements to the base language in order to distinguish their product from everyone else's.
NoSQL
- In addition to files and traditional SQL databases, NoSQL databases are becoming a popular method of storing data.
- The motivation for NoSQL was driven by the big-data challenges of scale and performance.
- NoSQL relaxes some of the constraints (consistency, structure, etc.) that exist in SQL databases in exchange for performance and scalability.
Database Server
- A database server can either be a dedicated machine or a virtualized machine that is running the database management software.
Typically, you'll hear this referred to as an instance of the server and multiple instances or multiple separate installations can be installed on a single machine at the same time. That's because when installing the server software, you'll give the instance a unique name so they can function alongside multiple other instances without getting all tangled up.
Typically, users log in to the DBMS over a network and issue commands to the database server processes and then sends the results back to the users terminal for review.
- The database server software usually has some steep hardware requirements that go well beyond the typical desktop machine. Server-class hardware typically features
- extremely fast read and write disk speeds
- large amounts of memory
- more multicore processors than you would expect to be able to stuff into a single box.
- Very fast connections to the local network or internet.
Terminology
A relation schema is the logical definition of a table - it defines what the name of the table is, and what the name and type of each column is. It's like a plan or a blueprint. A database schema is the collection of relation schemas for a whole database.
A table is a structure with a bunch of rows (aka "tuples"), each of which has the attributes defined by the schema. Tables might also have indexes on them to aid in looking up values on certain columns.
A database is, formally, any collection of data. In this context, the database would be a collection of tables. A DBMS (Database Management System) is the software (like MySQL, SQL Server, Oracle, etc) that manages and runs a database.