程序代写案例-SPRING 2021 MIS|学霸联盟

程序代写案例-SPRING 2021 MIS

时间：2022-05-02

Sample NoSQL Database Project

NOTE TO SPRING 2021 MIS 64082 STUDENTS:
This sample paper from a former student is being provided to you as a general guide only to the type of
paper / project you might want to pursue on your own. No assumptions should be made on your part as
to the overall quality of this paper, or the grade received by the student. Further, I expect you not to
copy what this student has done, or to otherwise plagiarize any portion of his/her work in any way.
The student who completed this project has given me explicit permission to share it with you.
You DO NOT have permission to post it on any public course material sharing websites.
Further, you should not be contacting any former students or searching the internet for any other
completed papers to submit as though they were your own work. If caught doing so, you will
receive a score of 0 on the assignment, and potentially an automatic F in the course.

* * * * * * * * * * * * * * *

I have chosen to implement a NoSQL database for astronomical objects. NoSQL makes sense
for this purpose for several reasons. First, when making astronomical observations, you might want to
know information about the stars that you have focused in the telescope. The data that is stored about
stars varies across stars due to the variety in how they are formed. Some stars have companion stars,
ones that orbit around each other, and some are on their own. The database should store a list of
companion stars so astronomers can find out interesting observations such as binary pairs.
Additionally, some stars have planets around them thus the database should store data about the
number of planets and some additional information about them. Finally, at least within our solar
system, there are moons around the planets that should be tracked. In addition to stars there are also
things like blackholes and nebula which are of interest for the astronomers and should be considered
when developing a database.
Astronomers may want to look over all the data that is available on a set of stars, currently
these stars are stored in many catalogs and querying these catalogs is difficult. An application which
can store all known information about stars and can be easily queried is needed. The use cases of the
database are varied. I envision 3 use cases of the database: backyard astronomy, professional
astronomy, research.

1) Backyard astronomy would be interested in a small subset of stars and simple information
about them. A phone app could access the data and show users interesting objects. This use
case demands that there is a high read capacity of the database because the app could be used
by millions. There would be no writes to the astronomy objects for the purpose of this use
case.

2) Professional astronomy has a much more demanding need of the database. They do not need
to have as many simultaneous connections (There are fewer professional astronomers).
Because their telescopes are orders of magnitude more powerful than backyard astronomy,
they will have larger bulk reads from the database to include all the stars in their field of view.
These astronomers will be able to upload their findings to the database (Creating new stars in
the database).

3) Researchers will have an interesting use case. They operate like a data analyst because they are
not adding any data to the database but rather, they are querying the data to find answers to
their research questions. This requires that the database have a high capability to query large
jobs, potentially returning billions of records. The researchers do not need to have data that is
up to date thus they do not require data to be consistent at all times.

NoSQL offers many advantages over a traditional relational database. One of the main
advantages of NoSQL is the scalability of the database. Scalability is one of the main concerns of this
project because there are approximately 200 billion stars in the Milky Way alone. This alone doesn’t
present much of a problem for relational databases, but it does present an issue when we start
considering joins between the planets, and moons tables. The volume of data and the number of joins
would quickly overwhelm the server when more stars are added. NoSQL handles this because the data
can have inconsistencies with the assumption of eventual consistency. Additionally, reads of the data
are incredibly efficient because the data can be spread out over a huge number of servers which
individually handle reads. NoSQL databases do not require the data schema to be set up at the
beginning. This is helpful because we do not know the full extent of the information that might be
stored in this database. Lastly, NoSQL can store a variety of datatypes which may accompany this
database.
I have chosen MongoDB because it offers an easy to use document database. MongoDB also
has tools such as Mongo Compass which would allow users who are unfamiliar with programming or
command line interfaces to work with the data. There are other databases which would also work well
such as Google Firebase which is a document database which is offered online. I chose to use Mongo
over these other implementations because Mongo has more support than its competitors and has a
community edition which helps with limited funding. There are other types of NoSQL databases such
as Hadoop which would work with a large dataset like this but it wouldn’t work as well for situations
where there are real-time users such as backyard astronomers.
MongoDB implements a document database. In MongoDB data is stored in documents which
contain JSON-like fields. The structure of the documents can vary between documents which allow the
structure to change as the need arises. Documents can vary in their composition; this is useful because
not all stars have planets and not all planets have moons. Thus, our documents sometimes contain
these extra fields but do not have to have these fields. This is also useful for this application because
the known information about stars is also highly variable and thus some stars will have missing data
which is unknown to astronomers.
In addition to the structural flexibility of document databases, documents can contain a variety
of datatypes such as strings, decimals, and even other documents. This allows the complex structure of
a document to be contained within another document, this has some unique advantages over
traditional databases such as being able to get all the related data without having to join tables or
create subqueries. Thus, documents allow the flexibility to model the database after the real-world
structure of the data. For example, the hierarchical structure of stars, planets, and moons represent a
real-world case of nested data. This is one of the main advantages of a NoSQL database over a
traditional relational database.
In order to model the real-world nested data in this project I have created a collection known
as stellar_objects which contains all the documents of stars, blackholes, and nebulae. Because all
planets are around stars, we will create a list of planet documents on each star that has planets.
Similarly, all moons orbit planets thus they will also be stored in a list of documents on each planet
which has moons. The ability to store nested documents is essential for this application because each
star will be a document, planets will be an array of documents stored on their parent star, and moons
will be documents stored in an array on their parent planet.
Shown below is examples of the documents and some data presented in a query. The flexibility
that document databases allow means that there can be modifications to these document structures
to accommodate new data if the current structure doesn’t account for that information. The flexibility
to change structure was the most important factor for choosing NoSQL for this project. This is because
as astronomers discover new stars, exoplanets, and maybe even exomoons the database will need to
adjust to accommodate these new data which might not always align with what we expect.

Stellar Objects:

Here is an example of what a stellar_object looks like:

In this picture we can see that there are a variety of data available about each star. However,
the data varies between these stars based on the available information. For example, the second star
(Alpha Centauri A) has an unknown age thus that data is not listed. Alpha Centauri A also has two
companion stars (Alpha Centauri B, Proxima Centauri). This can be seen in the image below:

The companion stars are listed in an array of Object IDs which reference each other. The data
of these stars is not kept in this array, only the IDs, thus this is a case where additional lookups would
be required to obtain all the information of the companion stars.

A researcher might want to query for all the stars which have companion stars. This might
sound difficult at first but because the data is stored in an array, we can write a query to determine if
this array exists as follows:

Only stars which have a companion stars array will be shown here. This makes searching for
binary or higher order star systems an easy task. A filter such as this could enable insights into what
types of stars are in binary star systems or if binary systems have planets.
Planets:

Planets can be seen by looking into a stellar object which contains an array of planet documents. The
image below shows an example of this.

As can be seen in this example, Proxima Centauri b is a planet which orbits the star Proxima
Centauri. This star has only one planet but if it had more they could be stored in the planets array.

This is an example of a query for all stellar objects which have planets that are of type rock:

This query is made possible because the planets are stored in their parent star. This means that
complex queries such as which star class has the highest occurrence of rocky planets becomes an easy
filter on planet type and a count of star class.

Moons

Moons can be found by looking into the planets of a star. Shown below is the array of moons on a
couple of planets.

This image shows a few interesting features of document databases. The most obvious is that
the arrays of moons are arrays of objects. Because arrays do not care about order the planets can be
added to the database in any order as seen above. Another thing to note is that all the above
information is stored on a stellar object (The Sun) thus when one writes a query to get all information
about the sun all the planets and moons will also be returned.

They key feature of this database is the hierarchy that it allows by nesting documents. The
entire hierarchy can be seen below:

In the above image we can see an example of the highest-level document in this collection a
star. Inside that is an array of planet(s) which orbit that star and should be associated with it. Finally, at
the lowest level we can see an array of moons which orbit the associated planet(s).

Additional Information:

Since the focus of this database is the stellar objects most of the data will be stars and their relevant
planets and moons. However, there can be other objects that don’t easily fit into this category. If we
were using a traditional database there would be several fields which were null, and several attributes
which would not apply to these other objects. Examples of alternative object types are shown below.
As can be seen a NoSQL database can tailor the attributes to the situational demands.

This image shows that nebula objects are measured in light years and have catalog names, they
don’t have a value for mass or spectral type. Also shown is a black hole which would not be considered
a star but can be stored in the database as well without a need to create a different collection/table.