Spaces:

ash-coded-it
/

sithscripts

Runtime error

File size: 59,220 Bytes

c82bf42

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summarize chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "- Loaded .env file successfully.\n",
      "- API_ENV: local\n",
      "- DOCKER_MODE: True\n",
      "conversation tokens: 2111\n",
      "document length: 1\n"
     ]
    }
   ],
   "source": [
    "from langchain.text_splitter import TokenTextSplitter\n",
    "from app.common.constants import CONVERSATION_EXAMPLES, ChatTurnTemplates\n",
    "from app.shared import Shared\n",
    "\n",
    "splitter = TokenTextSplitter(\n",
    "    encoding_name=\"cl100k_base\",\n",
    "    chunk_size=4096,\n",
    "    chunk_overlap=0,\n",
    ")\n",
    "conversation: str = \"\".join(\n",
    "    ChatTurnTemplates.ROLE_CONTENT_1.format(\n",
    "        role=message[\"role\"].upper(), content=message[\"content\"].strip()\n",
    "    )\n",
    "    for message in CONVERSATION_EXAMPLES\n",
    ")\n",
    "tokens: int = len(splitter._tokenizer.encode(conversation))\n",
    "\n",
    "docs = splitter.create_documents([conversation])\n",
    "\n",
    "print(f\"conversation tokens: {tokens}\")\n",
    "print(f\"document length: {len(docs)}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conversation Summarize Chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new  chain...\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new  chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following text delimited by triple backquotes. Return your response in bullet points which covers the key points of the text.\n",
      "```\n",
      "### USER: are you familiar with a technology startup called Jigsaw that was acquired by Salesforce.com? can you provide a detailed summary of its company history, business model, and product?\n",
      "### ASSISTANT: As of my knowledge cutoff date in September 2021, I am not aware of a technology startup called Jigsaw that was acquired by Salesforce.com. However, there was a company called Jigsaw Data Corporation, which was acquired by Salesforce in 2010. I can provide you with information on that company, but please note that my knowledge may be outdated, and there may be another Jigsaw startup that I am not aware of.\n",
      "Jigsaw Data Corporation was a technology company founded in 2004 by Jim Fowler and Garth Moulton. The company specialized in providing crowd-sourced business contact data, offering a cloud-based platform for users to access and update business contacts and company information.\n",
      "The business model of Jigsaw was unique at that time, as it relied on a collaborative, crowd-sourced approach to collect and maintain data. Users could access the platform for free by contributing their own contact information, or they could purchase points to access the data without contributing. This model incentivized users to add and update contact information, ensuring the database remained fresh and accurate.\n",
      "Jigsaw's primary product was an online directory of business contacts and company information. The platform provided its users with access to a vast database containing millions of business contacts across various industries. Users could search for contacts by various criteria such as name, title, company, or location. Additionally, Jigsaw offered a suite of data management tools that allowed users to import, export, and manage their contacts efficiently.\n",
      "In April 2010, Salesforce.com, a leading cloud-based customer relationship management (CRM) company, announced the acquisition of Jigsaw Data Corporation for approximately $142 million in cash. The acquisition aimed to enhance Salesforce's CRM offerings by integrating Jigsaw's data services into its platform, allowing users to access updated business contact information within their CRM system directly.\n",
      "After the acquisition, Jigsaw's product was rebranded as \"Data.com\" and became an integral part of the Salesforce.com ecosystem. Data.com offered Salesforce customers access to the crowd-sourced contact database, as well as additional data management and enrichment services.\n",
      "Please note that this information is accurate as of my knowledge cutoff date in September 2021, and there may have been further developments or changes to the company and its products since then.\n",
      "### USER: That's the correct company. Thank you. can you tell me more about its unique crowd-sourced approach to collecting and maintaining data? which other companies have used a similar approach?\n",
      "### ASSISTANT: Jigsaw Data Corporation's crowd-sourced approach to collecting and maintaining business contact data relied on its user community to contribute, update, and verify information. This model had several key components:\n",
      "\n",
      "Contribution: Users could create a free account on Jigsaw's platform by contributing their own business contact information. They could also add new contacts to the database and gain points, which could be used to access other contacts' information.\n",
      "\n",
      "Verification: Jigsaw encouraged users to verify the accuracy of contact information in the database by rewarding them with points for each correction made. This created an incentive for users to keep the data up-to-date and accurate.\n",
      "\n",
      "Points system: Jigsaw's points system played a crucial role in its crowd-sourced model. Users could earn points by contributing new contacts, verifying existing contacts, or referring new users to the platform. These points could then be used to access contact information without having to contribute their own.\n",
      "\n",
      "Data exchange: Jigsaw allowed users to exchange their points for contact information. Users who didn't want to contribute their own data could purchase points, while those who contributed regularly could access the database for free by using their earned points.\n",
      "\n",
      "Data quality: To ensure data quality, Jigsaw implemented a rating system for users based on their contributions and accuracy. Users with higher ratings were considered more reliable, which motivated users to contribute accurate and up-to-date information.\n",
      "\n",
      "Several other companies have used crowd-sourcing techniques in different industries, although not necessarily in the same way as Jigsaw:\n",
      "\n",
      "Waze: Waze is a popular GPS navigation app that uses crowd-sourced data from its users to provide real-time traffic information, road conditions, and other driving-related updates.\n",
      "\n",
      "Wikipedia: Wikipedia is a free online encyclopedia that relies on its user community to create, edit, and maintain its content. It has become one of the largest and most comprehensive sources of information on the internet.\n",
      "\n",
      "OpenStreetMap: OpenStreetMap is a collaborative project that allows users to create and edit map data. It relies on a global community of contributors to maintain accurate and up-to-date mapping information.\n",
      "\n",
      "Mechanical Turk: Amazon's Mechanical Turk is a marketplace for human intelligence tasks (HITs), which allows users to complete small tasks in exchange for payment. It relies on the crowd to complete tasks that are challenging for machines but simple for humans.\n",
      "\n",
      "Kaggle: Kaggle is a platform for data scientists and machine learning practitioners to participate in competitions, collaborate on projects, and learn from each other. It uses crowd-sourcing to find innovative solutions to complex data problems.\n",
      "### USER: do you think that a crowd-sourced approach to data collection could work for an AI startup? specifically, the startup would award users points for contributing proprietary data used to train the AI model. in exchange, users could spend points by asking the AI questions. for example, an \"AI architect\" startup could give users points for contributing architectural plans and CAD drawings. users could then spend those points by asking the AI to design new plans. do you think this idea is viable?\n",
      "### ASSISTANT: A crowd-sourced approach to data collection can indeed work for an AI startup, especially if the startup aims to develop AI models that require a vast and diverse dataset to perform well. The idea you've proposed, in which users contribute proprietary data to train the AI model and can later use the AI's services by spending points, has some potential advantages:\n",
      "\n",
      "Incentivized data contribution: By rewarding users with points for contributing data, you create an incentive for them to share valuable information, which can help grow and diversify your dataset.\n",
      "\n",
      "Encouraging user engagement: A points-based system encourages users to actively participate in the platform, which can help improve the AI model over time and increase user satisfaction.\n",
      "\n",
      "Cost-effective data acquisition: Crowd-sourcing can be a cost-effective way to gather large amounts of data, as it leverages the efforts of a large community rather than relying on paid data collection services.\n",
      "\n",
      "Continuous improvement: As users contribute more data and use the AI's services, the model can be iteratively improved, leading to better performance and more valuable insights.\n",
      "\n",
      "However, there are also challenges and potential pitfalls to consider:\n",
      "\n",
      "Data quality: Ensuring the quality and accuracy of the contributed data is critical. You may need to implement verification processes, such as peer review or expert validation, to maintain data quality.\n",
      "\n",
      "Intellectual property and privacy: Handling proprietary data comes with legal and ethical considerations. You'll need to establish clear terms of use, ensure data protection, and address any intellectual property concerns when users contribute data.\n",
      "\n",
      "Scalability: If the AI's services become popular, you may face challenges related to scaling the infrastructure and resources needed to support a growing user base.\n",
      "\n",
      "User retention: Maintaining user engagement and interest in the platform can be challenging, especially as users may eventually exhaust their need for the AI's services.\n",
      "\n",
      "Monetization: Balancing the points-based system with a sustainable revenue model might be challenging, as users may prefer to use the platform for free by contributing data rather than paying for services.\n",
      "\n",
      "Overall, the viability of this idea depends on various factors, including the nature of the AI model, the target market, the quality and relevance of the contributed data, and the ability to address the challenges mentioned above. If the startup can successfully navigate these challenges, a crowd-sourced approach to data collection could be a promising strategy.\n",
      "### USER: thank you, that was an excellent answer. besides architecture, what are some other areas where this crowd-sourced approach to data collection could be used in order to improve an AI model? specifically, what are some valuable data sets that are somewhat proprietary and hard to get, but users who own that data may be willing to share it in exchange for points?\n",
      "### ASSISTANT: Thank you for your kind words. There are several areas where a crowd-sourced approach to data collection could be used to improve AI models by leveraging valuable and somewhat proprietary datasets. Here are a few examples:\n",
      "\n",
      "Medical and health data: AI models can greatly benefit from access to diverse patient data, such as electronic health records, medical imaging, and genomic data. Users (patients or healthcare professionals) might be willing to share anonymized data in exchange for points, which could then be used to access AI-driven health insights, personalized treatment suggestions, or early disease detection.\n",
      "\n",
      "Legal document analysis: Law firms and legal professionals often have access to large collections of legal documents, such as contracts, court rulings, or patent filings. By sharing these documents, users could contribute to training AI models for legal document analysis, and in return, gain access to AI-driven legal research tools or contract review services.\n",
      "\n",
      "Art and creative work: Artists and designers may possess large collections of their own artwork, sketches, or designs. Sharing this data could help train AI models for artistic style transfer, generative art, or design assistance. Users could then access AI-driven creative tools or personalized design suggestions.\n",
      "\n",
      "Finance and investment: Financial professionals and investors may have access to proprietary trading algorithms, portfolio data, or market analysis reports. By sharing this data, they could contribute to AI models for financial analysis and predictions. In return, users could gain access to AI-driven investment advice, risk assessment, or market forecasting tools.\n",
      "\n",
      "Scientific research data: Researchers in various fields might have access to valuable datasets generated through experiments or simulations. By sharing this data, they can help train AI models for data analysis, pattern recognition, or predictive modeling in their respective domains. Users could then access AI-driven research tools or personalized research recommendations.\n",
      "\n",
      "Manufacturing and production data: Companies involved in manufacturing and production may possess proprietary data on production processes, quality control, and equipment performance. Sharing this data could improve AI models for predictive maintenance, process optimization, and quality assurance. Users could then gain access to AI-driven optimization suggestions or equipment monitoring services.\n",
      "\n",
      "```\n",
      "\n",
      "BULLET POINT SUMMARY:\n",
      "\u001b[0m\n",
      "DEBUG: Request {'method': 'post', 'url': '/chat/completions', 'session': <aiohttp.client.ClientSession object at 0x000001B5EF69B750>, 'params': {'messages': [{'role': 'user', 'content': 'Write a concise summary of the following text delimited by triple backquotes. Return your response in bullet points which covers the key points of the text.\\n```\\n### USER: are you familiar with a technology startup called Jigsaw that was acquired by Salesforce.com? can you provide a detailed summary of its company history, business model, and product?\\n### ASSISTANT: As of my knowledge cutoff date in September 2021, I am not aware of a technology startup called Jigsaw that was acquired by Salesforce.com. However, there was a company called Jigsaw Data Corporation, which was acquired by Salesforce in 2010. I can provide you with information on that company, but please note that my knowledge may be outdated, and there may be another Jigsaw startup that I am not aware of.\\nJigsaw Data Corporation was a technology company founded in 2004 by Jim Fowler and Garth Moulton. The company specialized in providing crowd-sourced business contact data, offering a cloud-based platform for users to access and update business contacts and company information.\\nThe business model of Jigsaw was unique at that time, as it relied on a collaborative, crowd-sourced approach to collect and maintain data. Users could access the platform for free by contributing their own contact information, or they could purchase points to access the data without contributing. This model incentivized users to add and update contact information, ensuring the database remained fresh and accurate.\\nJigsaw\\'s primary product was an online directory of business contacts and company information. The platform provided its users with access to a vast database containing millions of business contacts across various industries. Users could search for contacts by various criteria such as name, title, company, or location. Additionally, Jigsaw offered a suite of data management tools that allowed users to import, export, and manage their contacts efficiently.\\nIn April 2010, Salesforce.com, a leading cloud-based customer relationship management (CRM) company, announced the acquisition of Jigsaw Data Corporation for approximately $142 million in cash. The acquisition aimed to enhance Salesforce\\'s CRM offerings by integrating Jigsaw\\'s data services into its platform, allowing users to access updated business contact information within their CRM system directly.\\nAfter the acquisition, Jigsaw\\'s product was rebranded as \"Data.com\" and became an integral part of the Salesforce.com ecosystem. Data.com offered Salesforce customers access to the crowd-sourced contact database, as well as additional data management and enrichment services.\\nPlease note that this information is accurate as of my knowledge cutoff date in September 2021, and there may have been further developments or changes to the company and its products since then.\\n### USER: That\\'s the correct company. Thank you. can you tell me more about its unique crowd-sourced approach to collecting and maintaining data? which other companies have used a similar approach?\\n### ASSISTANT: Jigsaw Data Corporation\\'s crowd-sourced approach to collecting and maintaining business contact data relied on its user community to contribute, update, and verify information. This model had several key components:\\n\\nContribution: Users could create a free account on Jigsaw\\'s platform by contributing their own business contact information. They could also add new contacts to the database and gain points, which could be used to access other contacts\\' information.\\n\\nVerification: Jigsaw encouraged users to verify the accuracy of contact information in the database by rewarding them with points for each correction made. This created an incentive for users to keep the data up-to-date and accurate.\\n\\nPoints system: Jigsaw\\'s points system played a crucial role in its crowd-sourced model. Users could earn points by contributing new contacts, verifying existing contacts, or referring new users to the platform. These points could then be used to access contact information without having to contribute their own.\\n\\nData exchange: Jigsaw allowed users to exchange their points for contact information. Users who didn\\'t want to contribute their own data could purchase points, while those who contributed regularly could access the database for free by using their earned points.\\n\\nData quality: To ensure data quality, Jigsaw implemented a rating system for users based on their contributions and accuracy. Users with higher ratings were considered more reliable, which motivated users to contribute accurate and up-to-date information.\\n\\nSeveral other companies have used crowd-sourcing techniques in different industries, although not necessarily in the same way as Jigsaw:\\n\\nWaze: Waze is a popular GPS navigation app that uses crowd-sourced data from its users to provide real-time traffic information, road conditions, and other driving-related updates.\\n\\nWikipedia: Wikipedia is a free online encyclopedia that relies on its user community to create, edit, and maintain its content. It has become one of the largest and most comprehensive sources of information on the internet.\\n\\nOpenStreetMap: OpenStreetMap is a collaborative project that allows users to create and edit map data. It relies on a global community of contributors to maintain accurate and up-to-date mapping information.\\n\\nMechanical Turk: Amazon\\'s Mechanical Turk is a marketplace for human intelligence tasks (HITs), which allows users to complete small tasks in exchange for payment. It relies on the crowd to complete tasks that are challenging for machines but simple for humans.\\n\\nKaggle: Kaggle is a platform for data scientists and machine learning practitioners to participate in competitions, collaborate on projects, and learn from each other. It uses crowd-sourcing to find innovative solutions to complex data problems.\\n### USER: do you think that a crowd-sourced approach to data collection could work for an AI startup? specifically, the startup would award users points for contributing proprietary data used to train the AI model. in exchange, users could spend points by asking the AI questions. for example, an \"AI architect\" startup could give users points for contributing architectural plans and CAD drawings. users could then spend those points by asking the AI to design new plans. do you think this idea is viable?\\n### ASSISTANT: A crowd-sourced approach to data collection can indeed work for an AI startup, especially if the startup aims to develop AI models that require a vast and diverse dataset to perform well. The idea you\\'ve proposed, in which users contribute proprietary data to train the AI model and can later use the AI\\'s services by spending points, has some potential advantages:\\n\\nIncentivized data contribution: By rewarding users with points for contributing data, you create an incentive for them to share valuable information, which can help grow and diversify your dataset.\\n\\nEncouraging user engagement: A points-based system encourages users to actively participate in the platform, which can help improve the AI model over time and increase user satisfaction.\\n\\nCost-effective data acquisition: Crowd-sourcing can be a cost-effective way to gather large amounts of data, as it leverages the efforts of a large community rather than relying on paid data collection services.\\n\\nContinuous improvement: As users contribute more data and use the AI\\'s services, the model can be iteratively improved, leading to better performance and more valuable insights.\\n\\nHowever, there are also challenges and potential pitfalls to consider:\\n\\nData quality: Ensuring the quality and accuracy of the contributed data is critical. You may need to implement verification processes, such as peer review or expert validation, to maintain data quality.\\n\\nIntellectual property and privacy: Handling proprietary data comes with legal and ethical considerations. You\\'ll need to establish clear terms of use, ensure data protection, and address any intellectual property concerns when users contribute data.\\n\\nScalability: If the AI\\'s services become popular, you may face challenges related to scaling the infrastructure and resources needed to support a growing user base.\\n\\nUser retention: Maintaining user engagement and interest in the platform can be challenging, especially as users may eventually exhaust their need for the AI\\'s services.\\n\\nMonetization: Balancing the points-based system with a sustainable revenue model might be challenging, as users may prefer to use the platform for free by contributing data rather than paying for services.\\n\\nOverall, the viability of this idea depends on various factors, including the nature of the AI model, the target market, the quality and relevance of the contributed data, and the ability to address the challenges mentioned above. If the startup can successfully navigate these challenges, a crowd-sourced approach to data collection could be a promising strategy.\\n### USER: thank you, that was an excellent answer. besides architecture, what are some other areas where this crowd-sourced approach to data collection could be used in order to improve an AI model? specifically, what are some valuable data sets that are somewhat proprietary and hard to get, but users who own that data may be willing to share it in exchange for points?\\n### ASSISTANT: Thank you for your kind words. There are several areas where a crowd-sourced approach to data collection could be used to improve AI models by leveraging valuable and somewhat proprietary datasets. Here are a few examples:\\n\\nMedical and health data: AI models can greatly benefit from access to diverse patient data, such as electronic health records, medical imaging, and genomic data. Users (patients or healthcare professionals) might be willing to share anonymized data in exchange for points, which could then be used to access AI-driven health insights, personalized treatment suggestions, or early disease detection.\\n\\nLegal document analysis: Law firms and legal professionals often have access to large collections of legal documents, such as contracts, court rulings, or patent filings. By sharing these documents, users could contribute to training AI models for legal document analysis, and in return, gain access to AI-driven legal research tools or contract review services.\\n\\nArt and creative work: Artists and designers may possess large collections of their own artwork, sketches, or designs. Sharing this data could help train AI models for artistic style transfer, generative art, or design assistance. Users could then access AI-driven creative tools or personalized design suggestions.\\n\\nFinance and investment: Financial professionals and investors may have access to proprietary trading algorithms, portfolio data, or market analysis reports. By sharing this data, they could contribute to AI models for financial analysis and predictions. In return, users could gain access to AI-driven investment advice, risk assessment, or market forecasting tools.\\n\\nScientific research data: Researchers in various fields might have access to valuable datasets generated through experiments or simulations. By sharing this data, they can help train AI models for data analysis, pattern recognition, or predictive modeling in their respective domains. Users could then access AI-driven research tools or personalized research recommendations.\\n\\nManufacturing and production data: Companies involved in manufacturing and production may possess proprietary data on production processes, quality control, and equipment performance. Sharing this data could improve AI models for predictive maintenance, process optimization, and quality assurance. Users could then gain access to AI-driven optimization suggestions or equipment monitoring services.\\n\\n```\\n\\nBULLET POINT SUMMARY:\\n'}], 'model': 'gpt-3.5-turbo-0613', 'max_tokens': None, 'stream': False, 'n': 1, 'temperature': 0.7}, 'supplied_headers': None, 'files': None, 'request_id': None, 'request_timeout': None}\n",
      "DEBUG: request_kwargs {'method': 'post', 'url': 'https://api.openai.com/v1/chat/completions', 'headers': {'X-OpenAI-Client-User-Agent': '{\"bindings_version\": \"0.27.8\", \"httplib\": \"requests\", \"lang\": \"python\", \"lang_version\": \"3.11.4\", \"platform\": \"Windows-10-10.0.22621-SP0\", \"publisher\": \"openai\", \"uname\": \"Windows 10 10.0.22621 AMD64 Intel64 Family 6 Model 183 Stepping 1, GenuineIntel\"}', 'User-Agent': 'OpenAI/v1 PythonBindings/0.27.8', 'Authorization': 'Bearer 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', 'Content-Type': 'application/json'}, 'data': b'{\"messages\": [{\"role\": \"user\", \"content\": \"Write a concise summary of the following text delimited by triple backquotes. Return your response in bullet points which covers the key points of the text.\\\\n```\\\\n### USER: are you familiar with a technology startup called Jigsaw that was acquired by Salesforce.com? can you provide a detailed summary of its company history, business model, and product?\\\\n### ASSISTANT: As of my knowledge cutoff date in September 2021, I am not aware of a technology startup called Jigsaw that was acquired by Salesforce.com. However, there was a company called Jigsaw Data Corporation, which was acquired by Salesforce in 2010. I can provide you with information on that company, but please note that my knowledge may be outdated, and there may be another Jigsaw startup that I am not aware of.\\\\nJigsaw Data Corporation was a technology company founded in 2004 by Jim Fowler and Garth Moulton. The company specialized in providing crowd-sourced business contact data, offering a cloud-based platform for users to access and update business contacts and company information.\\\\nThe business model of Jigsaw was unique at that time, as it relied on a collaborative, crowd-sourced approach to collect and maintain data. Users could access the platform for free by contributing their own contact information, or they could purchase points to access the data without contributing. This model incentivized users to add and update contact information, ensuring the database remained fresh and accurate.\\\\nJigsaw\\'s primary product was an online directory of business contacts and company information. The platform provided its users with access to a vast database containing millions of business contacts across various industries. Users could search for contacts by various criteria such as name, title, company, or location. Additionally, Jigsaw offered a suite of data management tools that allowed users to import, export, and manage their contacts efficiently.\\\\nIn April 2010, Salesforce.com, a leading cloud-based customer relationship management (CRM) company, announced the acquisition of Jigsaw Data Corporation for approximately $142 million in cash. The acquisition aimed to enhance Salesforce\\'s CRM offerings by integrating Jigsaw\\'s data services into its platform, allowing users to access updated business contact information within their CRM system directly.\\\\nAfter the acquisition, Jigsaw\\'s product was rebranded as \\\\\"Data.com\\\\\" and became an integral part of the Salesforce.com ecosystem. Data.com offered Salesforce customers access to the crowd-sourced contact database, as well as additional data management and enrichment services.\\\\nPlease note that this information is accurate as of my knowledge cutoff date in September 2021, and there may have been further developments or changes to the company and its products since then.\\\\n### USER: That\\'s the correct company. Thank you. can you tell me more about its unique crowd-sourced approach to collecting and maintaining data? which other companies have used a similar approach?\\\\n### ASSISTANT: Jigsaw Data Corporation\\'s crowd-sourced approach to collecting and maintaining business contact data relied on its user community to contribute, update, and verify information. This model had several key components:\\\\n\\\\nContribution: Users could create a free account on Jigsaw\\'s platform by contributing their own business contact information. They could also add new contacts to the database and gain points, which could be used to access other contacts\\' information.\\\\n\\\\nVerification: Jigsaw encouraged users to verify the accuracy of contact information in the database by rewarding them with points for each correction made. This created an incentive for users to keep the data up-to-date and accurate.\\\\n\\\\nPoints system: Jigsaw\\'s points system played a crucial role in its crowd-sourced model. Users could earn points by contributing new contacts, verifying existing contacts, or referring new users to the platform. These points could then be used to access contact information without having to contribute their own.\\\\n\\\\nData exchange: Jigsaw allowed users to exchange their points for contact information. Users who didn\\'t want to contribute their own data could purchase points, while those who contributed regularly could access the database for free by using their earned points.\\\\n\\\\nData quality: To ensure data quality, Jigsaw implemented a rating system for users based on their contributions and accuracy. Users with higher ratings were considered more reliable, which motivated users to contribute accurate and up-to-date information.\\\\n\\\\nSeveral other companies have used crowd-sourcing techniques in different industries, although not necessarily in the same way as Jigsaw:\\\\n\\\\nWaze: Waze is a popular GPS navigation app that uses crowd-sourced data from its users to provide real-time traffic information, road conditions, and other driving-related updates.\\\\n\\\\nWikipedia: Wikipedia is a free online encyclopedia that relies on its user community to create, edit, and maintain its content. It has become one of the largest and most comprehensive sources of information on the internet.\\\\n\\\\nOpenStreetMap: OpenStreetMap is a collaborative project that allows users to create and edit map data. It relies on a global community of contributors to maintain accurate and up-to-date mapping information.\\\\n\\\\nMechanical Turk: Amazon\\'s Mechanical Turk is a marketplace for human intelligence tasks (HITs), which allows users to complete small tasks in exchange for payment. It relies on the crowd to complete tasks that are challenging for machines but simple for humans.\\\\n\\\\nKaggle: Kaggle is a platform for data scientists and machine learning practitioners to participate in competitions, collaborate on projects, and learn from each other. It uses crowd-sourcing to find innovative solutions to complex data problems.\\\\n### USER: do you think that a crowd-sourced approach to data collection could work for an AI startup? specifically, the startup would award users points for contributing proprietary data used to train the AI model. in exchange, users could spend points by asking the AI questions. for example, an \\\\\"AI architect\\\\\" startup could give users points for contributing architectural plans and CAD drawings. users could then spend those points by asking the AI to design new plans. do you think this idea is viable?\\\\n### ASSISTANT: A crowd-sourced approach to data collection can indeed work for an AI startup, especially if the startup aims to develop AI models that require a vast and diverse dataset to perform well. The idea you\\'ve proposed, in which users contribute proprietary data to train the AI model and can later use the AI\\'s services by spending points, has some potential advantages:\\\\n\\\\nIncentivized data contribution: By rewarding users with points for contributing data, you create an incentive for them to share valuable information, which can help grow and diversify your dataset.\\\\n\\\\nEncouraging user engagement: A points-based system encourages users to actively participate in the platform, which can help improve the AI model over time and increase user satisfaction.\\\\n\\\\nCost-effective data acquisition: Crowd-sourcing can be a cost-effective way to gather large amounts of data, as it leverages the efforts of a large community rather than relying on paid data collection services.\\\\n\\\\nContinuous improvement: As users contribute more data and use the AI\\'s services, the model can be iteratively improved, leading to better performance and more valuable insights.\\\\n\\\\nHowever, there are also challenges and potential pitfalls to consider:\\\\n\\\\nData quality: Ensuring the quality and accuracy of the contributed data is critical. You may need to implement verification processes, such as peer review or expert validation, to maintain data quality.\\\\n\\\\nIntellectual property and privacy: Handling proprietary data comes with legal and ethical considerations. You\\'ll need to establish clear terms of use, ensure data protection, and address any intellectual property concerns when users contribute data.\\\\n\\\\nScalability: If the AI\\'s services become popular, you may face challenges related to scaling the infrastructure and resources needed to support a growing user base.\\\\n\\\\nUser retention: Maintaining user engagement and interest in the platform can be challenging, especially as users may eventually exhaust their need for the AI\\'s services.\\\\n\\\\nMonetization: Balancing the points-based system with a sustainable revenue model might be challenging, as users may prefer to use the platform for free by contributing data rather than paying for services.\\\\n\\\\nOverall, the viability of this idea depends on various factors, including the nature of the AI model, the target market, the quality and relevance of the contributed data, and the ability to address the challenges mentioned above. If the startup can successfully navigate these challenges, a crowd-sourced approach to data collection could be a promising strategy.\\\\n### USER: thank you, that was an excellent answer. besides architecture, what are some other areas where this crowd-sourced approach to data collection could be used in order to improve an AI model? specifically, what are some valuable data sets that are somewhat proprietary and hard to get, but users who own that data may be willing to share it in exchange for points?\\\\n### ASSISTANT: Thank you for your kind words. There are several areas where a crowd-sourced approach to data collection could be used to improve AI models by leveraging valuable and somewhat proprietary datasets. Here are a few examples:\\\\n\\\\nMedical and health data: AI models can greatly benefit from access to diverse patient data, such as electronic health records, medical imaging, and genomic data. Users (patients or healthcare professionals) might be willing to share anonymized data in exchange for points, which could then be used to access AI-driven health insights, personalized treatment suggestions, or early disease detection.\\\\n\\\\nLegal document analysis: Law firms and legal professionals often have access to large collections of legal documents, such as contracts, court rulings, or patent filings. By sharing these documents, users could contribute to training AI models for legal document analysis, and in return, gain access to AI-driven legal research tools or contract review services.\\\\n\\\\nArt and creative work: Artists and designers may possess large collections of their own artwork, sketches, or designs. Sharing this data could help train AI models for artistic style transfer, generative art, or design assistance. Users could then access AI-driven creative tools or personalized design suggestions.\\\\n\\\\nFinance and investment: Financial professionals and investors may have access to proprietary trading algorithms, portfolio data, or market analysis reports. By sharing this data, they could contribute to AI models for financial analysis and predictions. In return, users could gain access to AI-driven investment advice, risk assessment, or market forecasting tools.\\\\n\\\\nScientific research data: Researchers in various fields might have access to valuable datasets generated through experiments or simulations. By sharing this data, they can help train AI models for data analysis, pattern recognition, or predictive modeling in their respective domains. Users could then access AI-driven research tools or personalized research recommendations.\\\\n\\\\nManufacturing and production data: Companies involved in manufacturing and production may possess proprietary data on production processes, quality control, and equipment performance. Sharing this data could improve AI models for predictive maintenance, process optimization, and quality assurance. Users could then gain access to AI-driven optimization suggestions or equipment monitoring services.\\\\n\\\\n```\\\\n\\\\nBULLET POINT SUMMARY:\\\\n\"}], \"model\": \"gpt-3.5-turbo-0613\", \"max_tokens\": null, \"stream\": false, \"n\": 1, \"temperature\": 0.7}', 'proxy': None, 'timeout': ClientTimeout(total=600, connect=None, sock_read=None, sock_connect=None)}\n",
      "DEBUG: Result <ClientResponse(https://api.openai.com/v1/chat/completions) [200 OK]>\n",
      "<CIMultiDictProxy('Date': 'Fri, 14 Jul 2023 10:56:32 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Cache-Control': 'no-cache, must-revalidate', 'openai-model': 'gpt-3.5-turbo-0613', 'openai-organization': 'user-qw6ihb92xdqxe5veajyqdfnb', 'openai-processing-ms': '8956', 'openai-version': '2020-10-01', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'x-ratelimit-limit-requests': '3500', 'x-ratelimit-limit-tokens': '90000', 'x-ratelimit-remaining-requests': '3499', 'x-ratelimit-remaining-tokens': '87033', 'x-ratelimit-reset-requests': '17ms', 'x-ratelimit-reset-tokens': '1.978s', 'x-request-id': '1b4860c1def085d5133016a49c2dd58d', 'CF-Cache-Status': 'DYNAMIC', 'Server': 'cloudflare', 'CF-RAY': '7e693d242fbf932c-ICN', 'Content-Encoding': 'gzip', 'alt-svc': 'h3=\":443\"; ma=86400')>\n",
      "\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "summarized tokens: 260\n",
      "summarized text:\n",
      " - Jigsaw Data Corporation was a technology company acquired by Salesforce.com in 2010.\n",
      "- Jigsaw specialized in crowd-sourced business contact data, offering a platform for users to access and update contacts and company information.\n",
      "- The company's unique business model relied on users contributing data or purchasing points to access the database.\n",
      "- Jigsaw's primary product was an online directory of business contacts and company information, along with data management tools.\n",
      "- After the acquisition, Jigsaw's product was rebranded as \"Data.com\" and integrated into the Salesforce.com ecosystem.\n",
      "- Jigsaw's crowd-sourced approach involved contribution, verification, a points system, data exchange, and data quality rating.\n",
      "- Other companies that have used crowd-sourcing techniques include Waze, Wikipedia, OpenStreetMap, Mechanical Turk, and Kaggle.\n",
      "- A crowd-sourced approach to data collection can work for an AI startup, with potential advantages such as incentivized data contribution and cost-effective data acquisition.\n",
      "- Challenges to consider include data quality, intellectual property and privacy concerns, scalability, user retention, and monetization.\n",
      "- The crowd-sourced approach could be applied to areas such as medical and health data, legal document analysis, art and creative work, finance and investment, scientific research data, and manufacturing and production data.\n"
     ]
    }
   ],
   "source": [
    "conversation_summarize_chain = Shared().stuff_summarize_chain\n",
    "result = await conversation_summarize_chain.arun(docs)\n",
    "summarized_tokens: int = len(splitter._tokenizer.encode(result))\n",
    "print(f\"summarized tokens: {summarized_tokens}\")\n",
    "print(f\"summarized text:\\n {result}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Map reduce summarize chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new  chain...\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new  chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"### USER: are you familiar with a technology startup called Jigsaw that was acquired by Salesforce.com? can you provide a detailed summary of its company history, business model, and product?\n",
      "### ASSISTANT: As of my knowledge cutoff date in September 2021, I am not aware of a technology startup called Jigsaw that was acquired by Salesforce.com. However, there was a company called Jigsaw Data Corporation, which was acquired by Salesforce in 2010. I can provide you with information on that company, but please note that my knowledge may be outdated, and there may be another Jigsaw startup that I am not aware of.\n",
      "Jigsaw Data Corporation was a technology company founded in 2004 by Jim Fowler and Garth Moulton. The company specialized in providing crowd-sourced business contact data, offering a cloud-based platform for users to access and update business contacts and company information.\n",
      "The business model of Jigsaw was unique at that time, as it relied on a collaborative, crowd-sourced approach to collect and maintain data. Users could access the platform for free by contributing their own contact information, or they could purchase points to access the data without contributing. This model incentivized users to add and update contact information, ensuring the database remained fresh and accurate.\n",
      "Jigsaw's primary product was an online directory of business contacts and company information. The platform provided its users with access to a vast database containing millions of business contacts across various industries. Users could search for contacts by various criteria such as name, title, company, or location. Additionally, Jigsaw offered a suite of data management tools that allowed users to import, export, and manage their contacts efficiently.\n",
      "In April 2010, Salesforce.com, a leading cloud-based customer relationship management (CRM) company, announced the acquisition of Jigsaw Data Corporation for approximately $142 million in cash. The acquisition aimed to enhance Salesforce's CRM offerings by integrating Jigsaw's data services into its platform, allowing users to access updated business contact information within their CRM system directly.\n",
      "After the acquisition, Jigsaw's product was rebranded as \"Data.com\" and became an integral part of the Salesforce.com ecosystem. Data.com offered Salesforce customers access to the crowd-sourced contact database, as well as additional data management and enrichment services.\n",
      "Please note that this information is accurate as of my knowledge cutoff date in September 2021, and there may have been further developments or changes to the company and its products since then.\n",
      "### USER: That's the correct company. Thank you. can you tell me more about its unique crowd-sourced approach to collecting and maintaining data? which other companies have used a similar approach?\n",
      "### ASSISTANT: Jigsaw Data Corporation's crowd-sourced approach to collecting and maintaining business contact data relied on its user community to contribute, update, and verify information. This model had several key components:\n",
      "\n",
      "Contribution: Users could create a free account on Jigsaw's platform by contributing their own business contact information. They could also add new contacts to the database and gain points, which could be used to access other contacts' information.\n",
      "\n",
      "Verification: Jigsaw encouraged users to verify the accuracy of contact information in the database by rewarding them with points for each correction made. This created an incentive for users to keep the data up-to-date and accurate.\n",
      "\n",
      "Points system: Jigsaw's points system played a crucial role in its crowd-sourced model. Users could earn points by contributing new contacts, verifying existing contacts, or referring new users to the platform. These points could then be used to access contact information without having to contribute their own.\n",
      "\n",
      "Data exchange: Jigsaw allowed users to exchange their points for contact information. Users who didn't want to contribute their own data could purchase points, while those who contributed regularly could access the database for free by using their earned points.\n",
      "\n",
      "Data quality: To ensure data quality, Jigsaw implemented a rating system for users based on their contributions and accuracy. Users with higher ratings were considered more reliable, which motivated users to contribute accurate and up-to-date information.\n",
      "\n",
      "Several other companies have used crowd-sourcing techniques in different industries, although not necessarily in the same way as Jigsaw:\n",
      "\n",
      "Waze: Waze is a popular GPS navigation app that uses crowd-sourced data from its users to provide real-time traffic information, road conditions, and other driving-related updates.\n",
      "\n",
      "Wikipedia: Wikipedia is a free online encyclopedia that relies on its user community to create, edit, and maintain its content. It has become one of the largest and most comprehensive sources of information on the internet.\n",
      "\n",
      "OpenStreetMap: OpenStreetMap is a collaborative project that allows users to create and edit map data. It relies on a global community of contributors to maintain accurate and up-to-date mapping information.\n",
      "\n",
      "Mechanical Turk: Amazon's Mechanical Turk is a marketplace for human intelligence tasks (HITs), which allows users to complete small tasks in exchange for payment. It relies on the crowd to complete tasks that are challenging for machines but simple for humans.\n",
      "\n",
      "Kaggle: Kaggle is a platform for data scientists and machine learning practitioners to participate in competitions, collaborate on projects, and learn from each other. It uses crowd-sourcing to find innovative solutions to complex data problems.\n",
      "### USER: do you think that a crowd-sourced approach to data collection could work for an AI startup? specifically, the startup would award users points for contributing proprietary data used to train the AI model. in exchange, users could spend points by asking the AI questions. for example, an \"AI architect\" startup could give users points for contributing architectural plans and CAD drawings. users could then spend those points by asking the AI to design new plans. do you think this idea is viable?\n",
      "### ASSISTANT: A crowd-sourced approach to data collection can indeed work for an AI startup, especially if the startup aims to develop AI models that require a vast and diverse dataset to perform well. The idea you've proposed, in which users contribute proprietary data to train the AI model and can later use the AI's services by spending points, has some potential advantages:\n",
      "\n",
      "Incentivized data contribution: By rewarding users with points for contributing data, you create an incentive for them to share valuable information, which can help grow and diversify your dataset.\n",
      "\n",
      "Encouraging user engagement: A points-based system encourages users to actively participate in the platform, which can help improve the AI model over time and increase user satisfaction.\n",
      "\n",
      "Cost-effective data acquisition: Crowd-sourcing can be a cost-effective way to gather large amounts of data, as it leverages the efforts of a large community rather than relying on paid data collection services.\n",
      "\n",
      "Continuous improvement: As users contribute more data and use the AI's services, the model can be iteratively improved, leading to better performance and more valuable insights.\n",
      "\n",
      "However, there are also challenges and potential pitfalls to consider:\n",
      "\n",
      "Data quality: Ensuring the quality and accuracy of the contributed data is critical. You may need to implement verification processes, such as peer review or expert validation, to maintain data quality.\n",
      "\n",
      "Intellectual property and privacy: Handling proprietary data comes with legal and ethical considerations. You'll need to establish clear terms of use, ensure data protection, and address any intellectual property concerns when users contribute data.\n",
      "\n",
      "Scalability: If the AI's services become popular, you may face challenges related to scaling the infrastructure and resources needed to support a growing user base.\n",
      "\n",
      "User retention: Maintaining user engagement and interest in the platform can be challenging, especially as users may eventually exhaust their need for the AI's services.\n",
      "\n",
      "Monetization: Balancing the points-based system with a sustainable revenue model might be challenging, as users may prefer to use the platform for free by contributing data rather than paying for services.\n",
      "\n",
      "Overall, the viability of this idea depends on various factors, including the nature of the AI model, the target market, the quality and relevance of the contributed data, and the ability to address the challenges mentioned above. If the startup can successfully navigate these challenges, a crowd-sourced approach to data collection could be a promising strategy.\n",
      "### USER: thank you, that was an excellent answer. besides architecture, what are some other areas where this crowd-sourced approach to data collection could be used in order to improve an AI model? specifically, what are some valuable data sets that are somewhat proprietary and hard to get, but users who own that data may be willing to share it in exchange for points?\n",
      "### ASSISTANT: Thank you for your kind words. There are several areas where a crowd-sourced approach to data collection could be used to improve AI models by leveraging valuable and somewhat proprietary datasets. Here are a few examples:\n",
      "\n",
      "Medical and health data: AI models can greatly benefit from access to diverse patient data, such as electronic health records, medical imaging, and genomic data. Users (patients or healthcare professionals) might be willing to share anonymized data in exchange for points, which could then be used to access AI-driven health insights, personalized treatment suggestions, or early disease detection.\n",
      "\n",
      "Legal document analysis: Law firms and legal professionals often have access to large collections of legal documents, such as contracts, court rulings, or patent filings. By sharing these documents, users could contribute to training AI models for legal document analysis, and in return, gain access to AI-driven legal research tools or contract review services.\n",
      "\n",
      "Art and creative work: Artists and designers may possess large collections of their own artwork, sketches, or designs. Sharing this data could help train AI models for artistic style transfer, generative art, or design assistance. Users could then access AI-driven creative tools or personalized design suggestions.\n",
      "\n",
      "Finance and investment: Financial professionals and investors may have access to proprietary trading algorithms, portfolio data, or market analysis reports. By sharing this data, they could contribute to AI models for financial analysis and predictions. In return, users could gain access to AI-driven investment advice, risk assessment, or market forecasting tools.\n",
      "\n",
      "Scientific research data: Researchers in various fields might have access to valuable datasets generated through experiments or simulations. By sharing this data, they can help train AI models for data analysis, pattern recognition, or predictive modeling in their respective domains. Users could then access AI-driven research tools or personalized research recommendations.\n",
      "\n",
      "Manufacturing and production data: Companies involved in manufacturing and production may possess proprietary data on production processes, quality control, and equipment performance. Sharing this data could improve AI models for predictive maintenance, process optimization, and quality assurance. Users could then gain access to AI-driven optimization suggestions or equipment monitoring services.\n",
      "\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new  chain...\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new  chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following text delimited by triple backquotes. Return your response in bullet points which covers the key points of the text.\n",
      "```\n",
      "Jigsaw Data Corporation was a technology company acquired by Salesforce.com in 2010. It specialized in crowd-sourced business contact data and offered a platform for users to access and update contacts. Their unique business model incentivized users to contribute and verify contact information. After the acquisition, Jigsaw's product was rebranded as \"Data.com\" and integrated into Salesforce's CRM platform. Crowd-sourcing techniques have been used by other companies like Waze, Wikipedia, and OpenStreetMap. A crowd-sourced approach to data collection could be viable for an AI startup, where users contribute proprietary data in exchange for points to access AI services. Possible applications include architecture, medical and health data, legal document analysis, art and creative work, finance and investment, scientific research data, and manufacturing and production data.\n",
      "```\n",
      "\n",
      "BULLET POINT SUMMARY:\n",
      "\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "- Jigsaw Data Corporation was acquired by Salesforce.com in 2010\n",
      "- Jigsaw specialized in crowd-sourced business contact data and provided a platform for users to access and update contacts\n",
      "- After the acquisition, Jigsaw's product was rebranded as \"Data.com\" and integrated into Salesforce's CRM platform\n",
      "- Crowd-sourcing techniques have been successfully used by companies like Waze, Wikipedia, and OpenStreetMap\n",
      "- Crowd-sourced data collection could be valuable for an AI startup, where users contribute proprietary data in exchange for points to access AI services\n",
      "- Potential applications for crowd-sourced data in an AI startup include architecture, medical and health data, legal document analysis, art and creative work, finance and investment, scientific research data, and manufacturing and production data.\n"
     ]
    }
   ],
   "source": [
    "\n",
    "\n",
    "map_reduce_summarize_chain = Shared().map_reduce_summarize_chain\n",
    "result = map_reduce_summarize_chain.run(docs)\n",
    "\n",
    "print(result)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}