Kickstarter Data Collection

Kickstarter Data Collection

(Please check the pdf attached, the information here is a simple copy-paste of the info in the pdf file)

We invite you to read this RFP carefully. Please share which aspects of the project you are familiar with or have experience in. Proposals that are generic will not be considered. Also, please tell us which parts of the project you expect to find most challenging, and how you plan to address them.
Any deviation from what is mentioned in this RFP is considered a violation of our agreement, and should be agreed upon in writing.
General Information
The requirements are to collect both i) campaign-level and ii) creator-level information on crowdfunding campaigns on Kickstarter.com. We will then use this dataset to conduct research on the factors shaping success in crowdfunding.

A large part of the work involves completing our existing data collection efforts and extending it as necessary, while ensuring that the data collection process and code can be replicated by other researchers. We are therefore seeking individuals with the following qualifications:
• Prior experience in web crawling, ideally from Kickstarter and other crowdfunding sources.
• Skills in merging and manipulating datasets.
• Knowledge of how to optimize data storage and access for future analysis, with an ability to provide guidance on data management best practices.
• It is important that we work with someone who can and will clearly document and share their decisions, code, and processes. And that, following the completion of the project, we can run their code ourselves.
• Skills in automating video transcripts are preferred.
• Skills in converting video content to textual descriptions using AI models are considered a plus.
• Lastly, we need a detail-oriented person who can systematically check the quality of their data outputs before submitting them to us. For example, you can share code or the results of some post-hoc analysis that we can replicate or independently run with our verification processes to help us check the completeness, accuracy, and overall quality of your work for each milestone
Documentation and Storage: Scripts for each step and process should be well-documented and organized in a Google Drive folder for the project. All outputs and accompanying documentation should also be included and easily accessible within the folder.

Phases: The project includes four milestones. The potential for a fifth phase is contingent upon our evolving needs, such as the development of a longitudinal dataset with a set of Kickstarter campaigns observed over time. A brief summary of each phase is as follows:
• Phase 1: Complete our existing dataset of Kickstarter data
• Phase 2: Expand the dataset with recent Kickstarter campaigns
• Phase 3: Expand the dataset with representative images
• Phase 4: Expand the dataset with transcripts of campaign videos

Expectations for all phases
• We expect you to communicate deadlines and provide estimated completion times for each task. Our expected completion times are as follows:
o Phase 1: November 10, 2024
o Phase 2: November 15, 2024
o Phase 3: November 22, 2024
o Phase 4: November 29, 2024
• It is crucial that you provide comprehensive documentation and well-commented, replicable code. This will enable our research team and others to understand and be able to reproduce your process.
• It is essential to implement thorough data quality checks for all outputs and scraped data. Please provide detailed reports to assess data quality, along with the code used to generate these reports, to our team.

Deliverables:
• Data frames as specified in each milestone:
o The dataframes should be optimized for future analysis with R. Our current expectation is an individual dataframe for each milestone. Other formats are expected and welcome, but the ability to provide the final datasets in R dataframes is essential.
o We expect you to consult us on better data storage and access. Please share your strategies for efficiently accessing and managing the large dataset of this project. Our goal is to avoid issues with basic tasks like running calculations, filtering, or merging parts of the dataset due to its size. Ensure the data can be easily sliced, computations can be run, and algorithms can be applied based on its structure. Additionally, please provide tips on improving our ability to handle and analyze the data.
• Documentation:
o Provide detailed documentation and code for each dataframe, including any preprocessing steps taken.
o Include any validation measures that have been implemented to ensure data accuracy.
• Scripts for Data Collection:
o Code and accompanying documents should clearly depict any intermediate processes, and important decisions made.
• Image Files and Video Transcripts:
o Image files (Phase 3) and video transcripts (Phase 4) along with their matching campaign IDs, as specified.
• Replicability:
o It is essential that the scripts be replicable.

Phase 1: Kickstarter campaign/creator information for existing dataset
The goal of this phase is to expand our existing dataset of Kickstarter campaigns and their creator information. We already have the URL information of 79,000 Kickstarter projects that were campaigned during January 2009 to April 2013.
Milestone 1.1: Kickstarter Campaigns with the Provided URLs
For the provided campaign URLs, the requirement is to collect the following information. Please refer to Figures 1 and 2 for reference.
1) Campaign ID: The unique identifier. Each campaign should have a unique ID. It is preferred to use the same ID that Kickstarter uses, which may be embedded in an HTML element.
2) Campaign URL
3) Campaign name
4) Funds raised: The total amount of funds raised for the campaign in USD.
5) Backers: The total number of backers for the campaign.
6) Location: The location of the campaign.
7) US campaign: 1 if the campaign is located in the US, 0 otherwise.
8) Funding goal: The funding goal in USD.
9) Campaign description: Textual descriptions of the campaign in the Story area (all paragraphs).
10) Product category: The category of the campaign.
11) Created at: The date when the campaign starts in YYYY-MM-DD format.
12) Ended at: The date when the campaign ends in YYYY-MM-DD format.
13) Campaign duration: Ended at – Created at
14) FAQ: The number of FAQs the campaign has posted.
15) Update: How often the campaign has been updated.
16) Campaign highlight: A highlighted description of the campaign.
17) Staff pick: 1 if the campaign is highlighted with ‘Project We Love’, 0 otherwise.
18) Video: Whether the campaign includes a product video. If so, a link to the video (e.g., https://v2.kickstarter.com/1729490278-fJ%2B7S6iBSCK1BtBtXLYGwG%2BCIQLRoxH4j%2F9AmyOeiY0%3D/campaigns/4816123/video-1310959-h264_base.mp4)
19) Creator ID: The unique creator identifier. Each creator should have a unique ID. It is preferred to use the same ID that Kickstarter uses, which may be embedded in an HTML element.
20) Creator name: The creator’s screen name.
21) Creator profile URL: The creator profile link (e.g., https://www.kickstarter.com/profile/swiftshape)
22) First created: Whether the campaign is the first campaign of the creator.
23) Creator joined at: The date when the creator joined Kickstarter.
24) Creator backed campaigns count: The count of other campaigns the creator has backed.
25) Creator created campaigns count: The count of campaigns the creator has created.
Milestone 1.2: Campaigns Backed by the Creator
For every creator in the Phase 1 dataset, the requirement is to gather their backed campaigns.

Milestone 1.3: Creator Created Campaigns
For every creator in Phase 1 dataset, the requirement is to gather their previously created campaigns. Also, for each previous campaign, we need their funding results and goal amounts.

Phase 2: Kickstarter campaign/creator information for new dataset

Milestone 2.1: Kickstarter Campaigns from April 2013 through May 2024
For all Kickstarter campaigns launched from April 2013 through May 2024, the following information should be collected. This task is essentially a repetition of Phase 1 with recent campaigns.
1) Campaign ID: The unique identifier. Each campaign should have a unique ID. It is preferred to use the same ID that Kickstarter uses, which may be embedded in an HTML element.
2) Campaign URL
3) Campaign name
4) Funds raised: The total amount of funds raised for the campaign in USD.
5) Backers: The total number of backers for the campaign.
6) Location: The location of the campaign.
7) US campaign: 1 if the campaign is located in the US, 0 otherwise.
8) Funding goal: The funding goal in USD.
9) Campaign description: Textual descriptions of the campaign in the Story area (all paragraphs).
10) Product category: The category of the campaign.
11) Created at: The date when the campaign starts in YYYY-MM-DD format.
12) Ended at: The date when the campaign ends in YYYY-MM-DD format.
13) Campaign duration: Ended at – Created at
14) FAQ: The number of FAQs the campaign has posted.
15) Update: How often the campaign has been updated.
16) Campaign highlight: A highlighted description of the campaign.
17) Staff pick: 1 if the campaign is highlighted with ‘Project We Love’, 0 otherwise.
18) Video: Whether the campaign includes a product video. If so, a link to the video (e.g., https://v2.kickstarter.com/1729490278-fJ%2B7S6iBSCK1BtBtXLYGwG%2BCIQLRoxH4j%2F9AmyOeiY0%3D/campaigns/4816123/video-1310959-h264_base.mp4)
19) Creator ID: The unique creator identifier. Each creator should have a unique ID. It is preferred to use the same ID that Kickstarter uses, which may be embedded in an HTML element.
20) Creator name: The creator’s screen name.
21) Creator profile URL: The creator profile link (e.g., https://www.kickstarter.com/profile/swiftshape)
22) First created: Whether the campaign is the first by the creator.
23) Creator joined at: The date when the creator joined Kickstarter.
24) Creator backed campaigns count: The count of other campaigns the creator has backed.
25) Creator created campaigns count: The count of campaigns the creator has created.
Milestone 2.2: Campaigns Backed by the Creator
For every creator in the Phase 2 dataset, the requirement is to gather their backed campaigns.

Milestone 2.3: Creator Created Campaigns
For every creator in Phase 2 dataset, the requirement is to gather their previously created campaigns. Also, for each previous campaign, we need their funding results and goal amounts.

Phase 3. Representative image
For every Kickstarter campaign in Phases 1 and 2, it is required to collect its representative image. Please refer to Figure 1 for an example of what the representative image should look like.

Phase 4. Video Description
For every Kickstarter campaign in Phases 1 and 2, it is required to collect the video, and generate a high quality transcript of the campaign video. Alternatively, you can provide a textual description of the video content generated by an AI model, which may involve inputting the video into the AI model. This phase is flexible and open to discussion with respect to technical and resource constraints.

Share the Post:

Related Posts