Back in October 2017, the team at Cal Poly Pomona injected a bunch of computer-simulated bubbles into the Milky Way Project with the goal of measuring the completeness of the bubbles catalog. Now, we’re uploading a new set of images for our citizen scientists to classify: some containing a new set of computer-simulated bubbles, and some containing transplanted bubbles.
Why Completeness Matters
Why are we going through all this effort? Just to recap: the goal of this new phase of the Milky Way Project is to evaluate the completeness of the bubble catalog. We know how many bubbles our volunteers identified, and want to quantify how many bubbles might have been missed. This is especially true for smaller, fainter bubbles or bubbles located in very busy regions of the Galaxy. These bubbles would likely be much more difficult to identify, both by volunteers and by machine learning algorithms. In order to measure how many bubbles might have been missed, we can place known bubbles at specified locations in the images, and see what percentage of our citizen scientist volunteers find them. For example, if a certain small, faint bubble is only identified 50% of the time by volunteers, then we can infer that other comparably small and faint bubbles were only identified about half of the time. This would mean that, although there may be a thousand such bubbles contained in the final catalog, the actual number of small, faint bubbles in the Galaxy is about twice as large as our catalog would imply.
Our first attempt from October 2017 revealed that some of the simulated bubbles were too easy to identify. We’d like to thank our volunteers (especially @BLGoodwin) for pointing this out to us! It’s impossible to gauge completeness with a sample of bubbles that are easy to spot! So, we’re going for a second iteration that includes a brand new set of simulated bubbles and transplanted bubbles.
Transplanted vs. Simulated Bubbles
Transplantedbubbles are real bubbles that have been previously found by Milky Way Project volunteers (and will be included in the Data Release 2 catalog and paper). We give them a random rotation, and in some cases scale them to be smaller and fainter (in other words – what the bubble would look like if it were farther away), and then “graft” the bubble back into a Milky Way Project image. Numerous real bubbles have been selected, which span a large range in angular sizes, brightness, shape, and overall complexity. Some appear to have formed in relative isolation, while others are located in more actively star-forming regions of the Galaxy. Roughly 700 transplanted bubbles have been grafted back into the Milky Way Project and will form the foundation for our completeness study.
We are also including a second set of computer-simulated bubbles in our second iteration of this project. Just like with the first round, some of these may appear obvious – and that’s okay! Understanding why some of the computer-simulated bubbles look different from the observations (while other simulated bubbles seem to “pass” the visual inspection test) will provide our theorist colleagues with valuable information that can help improve their models.
This image contains both real and transplanted bubbles. Can you tell which is which?
This is Dr. Breanna Binder, a postdoctoral researcher at Cal Poly Pomona who joined the Milky Way Project team over the last year. I’m hoping to use the classification powers of all you awesome citizen scientists to find… computer-generated bubbles. Read on to find out why!
Thanks to the hard work and dedication of all you volunteers, the Milky Way Project has produced catalogs of thousands of bubbles. These bubbles come in a variety of shapes and sizes and are located in many different types of environments – from the hectic downtown of the Galactic center to the relatively sparse suburbs of the outer disk. With this catalog in hand, we naturally want to start using it to ask scientific questions: What causes some bubbles to be circular in shape, but other oval? Do we find small bubbles in certain environments but not others? And when we do find interesting differences or correlations in our catalog, we of course have to ask: Why?
Before we can answer these questions, there’s another we need to consider first: How do we know if there are bubbles missing from our catalog? It seems reasonable that the smaller or fainter a bubble appears in an image, the easier it will be for our volunteers to miss. Similarly, a bubble in a very dense environment (like the left-hand picture below) might get lost in the crowd of its brighter or larger neighbors, even though that same bubble would be easily identified if it were isolated from other interesting star formation features (like in the right-hand image below).
A busy, star-forming environment
A sparsely star-forming environment
If we want to interpret the trends observed in our bubble catalog, we need to understand the completeness of our catalog. Let’s take the sizes of bubbles as an example: big bubbles are easier to identify and measure than small bubbles. So, we might expect that our volunteers were able to identify all of the bubbles bigger than a certain size (100% completeness) but were only able to find a third of bubbles smaller than some other size (33% completeness). Not only would we like to be able to estimate the total number of bubbles we’re missing, but we want to know how many we’re missing as a function of some interesting parameter (in this case, size).
How can we possibly do this? After all, we’re only able to identify… well, what we’re able to identify! So how can we measure the completeness of our bubble catalog?
The solution comes from computer simulations. We’ve partnered up with Dr. Christine Koepferl at the University of St. Andrews, who creates realistic, computer-generated synthetic bubbles that can be viewed at multiple distances and from many different viewing angles. These synthetic bubbles are so realistic-looking that even Milky Way Project technical lead Tharindu Jayasinghe and bow shock guru Don Dixon couldn’t tell them apart from the real thing!
I’ve taken Dr. Koepferl’s synthetic bubbles and added them to the Milky Way Project images. Since these are computer-generated images, I can specify exactly how many bubbles to add, where to place them in the images, how big they should appear, whether or not the bubble should be rotated or viewed from a different angle, etc. These modified observations – some of which contain a synthetic bubble – are then loaded back into the Milky Way Project.
This is the key to measuring the completeness of a survey! We know everything about the true distribution of the synthetic bubbles, so we can see how accurately our citizen scientist volunteers are able to find and measure them. If our volunteers find all the synthetic bubbles larger than a certain size, then we can be pretty confident they found all the real bubbles larger than that, too. However, if they are only able to pick out a third of small bubbles, then it’s likely that the small bubbles in our catalog only represent a third of the total number of small bubbles out there.
This information is really important for estimating the total number of bubbles in the Milky Way, and could have other very interesting implications for the distribution of bubbles in our Galaxy. This sort of completeness study was not conducted in the earlier versions of the Milky Way Project, and is not normally done in other citizen science initiatives either. We hope that you will continue to search for bubbles – both real and computer-generated – to help the Milky Way Project once again push the boundaries of what citizen science can accomplish!
What’s the first thing that comes to mind when you hear the term fubble? Perhaps it’s the popular children’s toy?
To us at the Milky Way Project (MWP), ‘fubble’ means FAKE BUBBLE. Yes, you heard that right. Fake bubbles = ’fubbles’. We would trademark this, but…it’s already taken. OK so what are MWP fubbles?
We have come to notice that certain patterns seen in our images, which resemble smoke rings, dark clouds and bright stars, often end up being classified as bubbles.
In order to avoid making such classifications, it is important to be clear about what constitutes a bubble and what does not.
Good bubble candidate have the following characteristics (see the example images), these are listed in decreasing order of importance:
Sharp green or yellow-green inner rims (the single most important factor).
Less bright green emission (in the 8 micron band) within the bubble—but not a corresponding lack of stars inside the bubble (see “Dark Clouds” below).
Usually: Some evidence of ‘red emission’ (in the 24 micron band) within the bubble. If this emission is lacking, the bubble had better be very sharp (i.e. if you’re going to violate rule 3, make sure rule 1 holds).
It is important to understand that while some good bubble candidates might bend these rules a little bit, the majority of the bubbles seen in our images fit these criteria quite well.
An example of a ‘good’ bubble candidate
Two excellent bubble candidates.
Now that we’ve understood what a bubble candidate ought to be, let’s explore a few instances where one might easily mistake a non-bubble object as a bubble.
Two examples of smoke rings identified by our volunteers.
These are interesting patterns in our images that closely resemble bubbles. What sets them apart from bubbles is that they are usually hazy, not sharp (violating Rule 1) and lack (red) emission in the 24 micron band (Rule 3, which cannot be violated unless Rule 1 holds). Distinguishing between a smoke ring and bubble is tricky, and it is totally fine if you decide to classify a few as possible bubble candidates. That said, being stringent with the criteria mentioned above should help eliminate most doubts and help us make a cleaner bubble catalog.
Two examples of dark clouds identified by our volunteers.
While dark clouds are associated with star formation, they are certainly not bubbles! We have come across a few instances where many people classified dark clouds as bubbles. Even though these objects satisfy the first part of Rule 2 (lack of green emission within candidate), they violate the second part of Rule 2 (stars are missing, which is the hallmark of a dark cloud) and usually violate Rule 1. Dark clouds should not be classified as bubbles.
Bright / Saturated stars
An example of a bright star that should not be classified as a bubble
We are still finding bright, red stars classified as bubbles (or yellowballs, or other objects). While they are spectacular and tempting to mark, classifying bright stars in MWP is never a good idea (unless you want to mark an associated artifact).
Bubbles are abound in many of our images. They vary in size and shape but they almost always obey the 3 rules that I’ve described above. Among the wide array of bubbles in our images, we also find tricksters/fubbles that can be deceiving. It takes some practice (that’s why we have a training workflow), but with practice you will definitely learn to catching bubbles and avoid fubbles.
Thank you for Supporting this Research!
Finding bubbles in our images takes time and effort, and thanks to all the work from our beloved MWP volunteers, we are making great progress in cataloging them. Thank you again for your hard work! We are glad to have you on board.
I would like to share some exciting news with you: The MWP was recently featured in Polytrends- a magazine produced by our home institute Cal Poly Pomona! What makes this particular article so special is that four of our (awesome) volunteers were also featured in it. They were interviewed and shared some of their thoughts on the MWP.
The official Polytrends magazine for the Winter 2017 quarter can be found here.
As we always reiterate, we owe it all to you, our volunteers! Thanks to your effort, we’ve been able to achieve many great things over the years.
I would also like to request your help in achieving our target of a million classifications by March 2017. Achieving this goal will help us improve our data reduction tools and create better catalogs of objects.
Keep an eye out for another blog post (onFubbles!) over the coming week!
During the first week of January 2017, MWP team members Tharindu Jayasinghe and Don Dixon, along with our project lead, Dr. Matthew Povich, travelled to Grapevine, Texas for the 229th meeting of the American Astronomical Society (AAS). The AAS is the major organization of professional astronomers in the United States. AAS general meetings are held twice a year in the months of January and June. These meetings are frequented by experts in their fields and are excellent opportunities for young researchers to network with colleagues and share the latest astronomical research.
Both Tharindu and Don presented scientific posters describing their work with MWP data, focusing on the DR2 bubble and bow shock catalogs. They describe their posters below:
I presented my work on the data reduction pipeline for the MWP along with the preliminary DR2 “alpha” bubbles catalog. Citizen scientists did a great job cataloging 10 times as many bubbles as those found by professional astronomers back in 2012 in our first data release (DR1). As I mentioned in my previous post:
“We need to publish an improved, DR2 bubble catalog. This catalog will be produced through the combined efforts of MWP volunteers in both Phase Two and Phase Three — 4.4 million total classifications. This is very exciting for us! Thanks to the improved ellipse tool, we are now able to obtain better measurements of bubble parameters. By zooming in further, smaller bubbles and structures are better resolved and classified. Essentially, we are combining the strengths of Phase 1 (“green+red” color scheme) and Phase 2 (improved ellipse tool, higher zoom) in one final run through the Spitzer images during Phase 3.”
With our tools in hand, we have already created a DR2 alpha bubble catalog that uses MWP classification data from 2012-2015. I compared our DR2 alpha bubble catalog to our DR1 bubble catalog (Simpson et al. 2012) and what we found was interesting—to say the least! Systematically, most DR2 bubbles were smaller and more elliptical than the corresponding DR1 bubbles—see three examples in the image below (sourced from my poster). This is good news! We think that we got rid of a systematic inaccuracy in our DR1 bubble shapes.
We noticed that most DR1 bubbles were circular — a property attributed to the elliptical annulus tool, which defaulted to circles. Post-2012, the MWP used an ellipse tool to identify bubbles. This tool defaults to an ellipse (oval), and we find that the bubbles cataloged after the introduction of the ellipse tool were more elliptical and better represented the bubbles that MWP were drawing.
The same argument holds for the sizes of the bubbles. DR1 bubbles were sometimes bigger than the bubbles visible in our images. We blame the elliptical annulus tool for this issue, because it was tricky to use. We now find that the DR2 bubbles are systematically smaller and better fit the bubbles visible in our images. This is advantageous for those astronomers looking for accurate bubble size and shape parameters. Our DR2 bubble catalog should really hit the mark!
In other news, I also spoke of a reliability metric used to score bubbles. This is something that the team is working on right now. A future blog post describing “fubbles” (fake bubbles) and bubble reliability is in the works. Keep an eye out for this in the coming weeks!
My poster presentation showcased the preliminary results of the MWP bow shock classifications. Since MWP: Phoenix is the first version in which volunteers can easily identify bow shocks, we were all excited to see the potential of the citizen scientists to find classify these objects. We compared the emerging MWP DR2 bow shock catalog to a published catalog created by a collaboration of students and professors across several institutions (Kobulnicky et al. 2016). That catalog contained 709 bow shock candidates. In only the first three months since launch, MWP: Phoenix rediscovered 196 and discovered 93 new quality candidates. These numbers are increasing every day as our volunteers continue to make new classifications.
These results showcase the ability MWP volunteers have to catch many of the bow shocks that were missed by our team of professional astronomers and students. We are optimistic that the MWP will rediscover a large majority of the previously-cataloged bow shocks and contribute a hundreds more to create the largest Galactic bow shock catalog ever assembled! This catalog will be a powerful tool for astronomers studying massive stars and the interstellar medium. This is all possible thanks to the citizen scientists that volunteered to be a part of the project, thanks for all your help and keep up the great work!
This is Don Dixon, a student researcher, at Cal Poly Pomona. I have spent the past year working with a team of researchers to build a catalog of infrared stellar wind bow shock nebulae. In this post, I will talk about how the Milky Way Project is allowing for a more comprehensive catalog of bow shocks and present the very first results of this effort.
Building a Milky Way Project (MWP) Enhanced Bow Shock Catalog: With the arrival of MWP version 3.0 (also known as Phoenix) our volunteers now have a far greater ability to find previously elusive bow shock nebulae. The project has a simple tool that enables volunteers to draw a polygon around the bow shock arc and choose what they believe to be the driving star with a reticle. This part of MWP: Phoenix is very important to us, because it is the first implementation of citizen science to find new bow shocks. With new MWP discoveries in addition to those made by previous researchers we will expand upon the largest existing catalog of 709 bow shocks, recently published by members of the MWP science team (Kobulnicky et al. 2016). This expanded catalog will be a powerful tool for the stellar astronomy community.
New Citizen Science Bow Shock Candidates. In only the first month, the volunteers of MWP:Phoenix have proven how much they can contribute to finding bow shock nebulae. Volunteers have already flagged hundreds of bow shock candidates, many of which were previously discovered, and I have confirmed 20 high-quality candidates that have not been previously discovered!
Collection of the first 20 never before found bow shocks from Milky Way Project!
Volunteers are finding bow shocks at a much faster rate than we could hope for and we are all very excited to see more found!
Tricky Classifications and Some Guidelines to Help. While examining some of the classifications made in the Milky Way Project I noticed that classifying these bow shocks is not always straightforward. Here are two guidelines to make finding and classifying them a little easier:
There should be a red or orange, well-defined arc distinguishable from the environment around it. The arc does not have to be perfect, and it could even look more like a bean. But without a clear “bend” to make an arc, or if the red-orange emission appears equally on both sides of the star, you should not classify a bow shock.
There must be a star centered behind the arc, generally no further away than the length of the arc. This means the brightest star in the image is not responsible for the arc if it too far away from it. Sometimes there may be more than one star that could work. I generally choose the most centered star that is reasonably bright, but in the really tough cases it is okay to go with your gut — when we analyze and review the classifications we will note cases where MWP volunteers found more than one potential driving star for the same bow shock.
A Very Special Case: A Bow Shock inside a Bubble? There are cases when classifying where it may look like a bow shock is inside a bubble. This is one of the harder cases to be confident about, and we recommend that you classify it as a bow shock if it meets the same criteria as those outside of bubbles and bring it to Talk so everyone can discuss it. And don’t forget to classify the bubble around the bow shock too!
Thank you for Supporting this Research! Finding bow shocks in the Milky Way is a daunting task (I know from personal experience) and thanks to all the work from the MWP volunteers we are making great progress in cataloging them. Thank you all for taking the time and effort to help in discovering these objects. Don’t forget it is because of you that we can do such powerful and exciting science, keep up the great work!
This is Tharindu Jayasinghe – the technical lead for the latest release of the Milky Way Project (MWP). In this blog post, I will revisit some of the greatest hits from the history of the MWP and give you a preview of our latest science. If you haven’t done so already, read the first installment of this brand new MWP blog.
The birth of the MWP
The MWP started off as the ninth (yes, we were one of the earliest!) Zooniverse project in December 2010. Inspired by ‘The Bubbling Galactic Disk’, a paper published in 2006 by a group of astronomers who visually inspected the Spitzer Space Telescope survey images in the first hunt for Galactic infrared bubbles, the Milky Way Project was born. The earliest visual inspections of this vast data set were done by just a small number of astronomers, most of whom were undergraduate students like myself. Their efforts culminated in a two bubble catalogs listing roughly 700 bubbles. You might be wondering, “Why do we need the MWP if a few undergraduates can do all this work?” The answer to this question captures the essence of citizen science. Thousands of volunteer “citizen scientists” going through the same dataset will spot most, if not all, the objects visible in this dataset, while a handful of even highly-trained astronomers will miss many objects. Of course, different citizen scientists might disagree on any given classification. Take for example, a bubble seen in the MWP data. I might draw this bubble slightly larger than what you would draw if you were asked to make a classification. Is this problematic? No, not in the least! The consensus of the crowd is what’s critical in citizen science, and, as you will see, different classifications average out very nicely to give us what we need.
The initial launch of the MWP was hugely successful! We drew in over 35,000 volunteers from all over the world. After several months and 520k bubble classifications, the first phase of the MWP ended. The MWP technical lead at the time, Dr. Robert Simpson, wrote code to aggregate all these classifications into one final bubble catalog. This “DR1” catalog of 5,106 bubbles was published along with the first MWP paper. This increased the total number of cataloged bubbles by an order of magnitude. This was an amazing feat and spawned numerous follow-up papers (those by the core MWP are described on our Results page). All was not perfect however. Several problems, including redundancy, were seen in the DR1 catalog. Correcting these problems is critical to meeting our current science goals.
A second dose of the Milky Way Project
Shortly after our DR1 bubble catalog was published, we updated the classification interface, released new image sets (in a new color scheme), and improved upon the classification tools that were available in the first phase of the project. I will now describe the changes that were made in the second phase of the MWP and why they have turned out to be especially important.
So, what changed?
For those of you who have been with us during the first phase, you might have encountered the ‘elliptical annulus’ tool when classifying bubbles. I’ve included a figure to jog your memory. The new ‘ellipse’ tool available in phase 2 was a big change from the annulus tool. The annulus tool was difficult to use: it was by default a circle, changing the inner/outer radii was cumbersome and fitting it to match the shape was harder than it needed to be. By contrast, the ellipse tool makes it easier to draw bubbles and measure sizes and shapes more accurately.
Another drastic change made in phase 2 was the type of images used to make classifications. If you’ve used MWP in the past month, you are familiar with the stunning “green+red” color scheme used in the current images. This was the same color scheme that was used in the first phase of the project. The second phase of the Milky Way Project shifted to a “pink/red” color scheme. Why make this change? Viewing the Galaxy with different combinations of image data (corresponding to our two different color schemes) can help identify structures best seen in only one of the color schemes. Take for example EGOs (Extended Green objects): these were only seen in the “pink/red” color scheme. Another excellent example would be the bow shocks that are easily seen in the “green+red” color scheme but rarely spotted in the other. Moving to the “pink/red” color scheme also meant that images from certain parts of the Milky Way that were not available in the first phase due to a lack of survey coverage at 24 µm could be included. This allowed MWP volunteers to map the more of the Galaxy.
What did I mean by “lack of survey coverage”? To make the images in the “green+red” color scheme, we use data from two different surveys: MIPSGAL and GLIMPSE. The MIPSGAL data are responsible for the lovely hues of red seen in the bow shocks produced by the massive runaway stars and the distinctive color of “yellowballs”. But GLIMPSE (and two companion surveys, Vela-Carina and GLIMPSE-3D) together surveyed more of the Galaxy than MIPSGAL did. This enabled us to produce images in the ”pink/red” color scheme out of this extended data sets.
Our image making algorithm, created by the lead scientist of the MWP, Dr. Matthew Povich, had the capability to produce images at twice the level of zoom than did the previous version. The highest possible zoom level available in the first phase of the MWP was 0.3° x 0.15°. In the second phase, we doubled our maximum zoom level (producing images at 0.15° x 0.075°). This had a few implications. The number of images in MWP phase 2 was four times greater than in phase 1 (but that was OK, because by then we knew our volunteers could handle the increased workload). Smaller structures were more easily seen in the images with the highest zoom level, and this allowed citizen scientists to identify new structures and improve upon the measurement of existing structures previously catalogued by the MWP.
Bringing us back to the discussion about the second phase of the MWP, the project continued, now pink-red, and 3 years later, in 2015 we ran out of data and the MWP came to a hiatus. Dr.Robert Simpson took up a job at Google and my mentor, Dr. Matthew Povich took the helm. As decribed in Dr. Povich’s previous post, after a year-long hiatus the MWP resumed its operations behind the scenes as I joined the team and began development on the current version. Alongside this, I undertook the task of analyzing 3 years’ worth of classifications from the second phase of the Milky Way Project. Aggregating your classifications from MWP Phase Two
I completely rewrote the MWP classification aggregation pipeline and work began on understanding the 2 million classifications made by MWP volunteers from 2012-2015. We found some pretty amazing results. The ellipse tool significantly changed how volunteers made measurements of bubble sizes. The drawings that were made in phase two were more accurate and representative of the actual shapes of the bubbles (see the comparison figure below). Over 24,000 volunteers helped us during phase two and made over 770,000 bubble classifications. From this data, we were able to obtain roughly 4400 bubble candidates in our “DR2 alpha” catalog. It should be noted however, that we are continuously testing and making modifications to this pipeline in order to avoid the issues that were encountered in the DR1 catalog. You may wonder why this our DR2 catalog contains fewer bubbles (4400) compared to the DR1 catalog (5100). There are two reasons: (1) Bubbles are more easily seen in the “green+red” color scheme (DR1) than in the “pink/red” scheme (DR2), and (2) we have made progress getting rid of the duplicated bubbles found in the DR1 catalog.
The Phoenix rises from its ashes: MWP Phase Three
MWP Phase Two seems to have done its job. “So why do you still need my help?” you might ask. Excellent question! There are three main goals:
We want to build and publish the first-ever citizen-science enabled bow shock catalog. We did not ask people to identify bow shocks in Phase One, and even if we had, we didn’t have high enough zoom levels in the image. We did ask people to look for bow shocks in Phase Two, but they were difficult or impossible to spot in the pink/red color scheme.
We need to publish an improved, DR2 bubble catalog. This catalog will be produced through the combined efforts of MWP volunteers in both Phase Two and Phase Three — 4.4 million total classifications. This is very exciting for us! Thanks to the improved ellipse tool, we are now able to obtain better measurements of bubble parameters. By zooming in further, smaller bubbles and structures are better resolved and classified. Essentially, we are combining the strengths of Phase 1 (“green+red” color scheme) and Phase 2 (improved ellipse tool, higher zoom) in one final run through the Spitzer images during Phase 3.
Finally, with the higher zoom level, green+red color scheme, and a dedicated “yellowball” tool, an improved and more comprehensive yellowball catalog is also in the works.
My fellow MWP enthusiasts, great things await all of us. Keep reading our blog for more updates, and most importantly, keep on classifying! Stay tuned for the next installment of the MWP blog where Don M. Dixon, a science team member, shares some exciting information about the bow shock search in MWP phase 3. I cannot stress enough as to how we are very thankful to all of you, our amazing volunteers, for all your work in making the MWP a success. We owe it all to your efforts. Thank you for helping us better understand our home Galaxy better.