Tag Archive | "Machine"

Etsy CEO: Machine Learning is Opening Up a Whole New Opportunity

Etsy CEO Josh Silverman says that “machine learning is opening up a whole new opportunity” for the company to organize 50 million items into a discovery platform that makes buying an enjoyable experience and also is profitable for sellers.

Josh Silverman, CEO of Etsy, recently talked about their much-improved business and why it is working so well with Jim Cramer on CNBC:

Our Mission is Keeping Commerce Human

Our mission is keeping commerce human. It’s really about in a world where automation is changing the nature of work and we’re all buying more and more commoditized things from the same few fulfillment centers. Allowing someone to harness their creative energy and turn that creativity into a business and then connect with someone in the other part of the country or in another part of the world, that’s really special. We think there’s an ever-increasing need for that in this world.

It’s about value. We’ve been really focused on delivering more value for our makers. Etsy really is a platform that brings buyers to sellers and that’s very valuable. We raised our commission from 3.5 to 5 percent commission which was I think is fair value for our sellers, particularly because we’re reinvesting 80 percent of that into the growth of the platform.

Free shipping is pretty much table stakes today. Yet only about 20 percent of items have free shipping. About half of all the items on Etsy buyers say have shipping prices that are too high and yet we grew GMS at 20 percent last quarter.

Machine Learning is Opening Up a Whole New Opportunity

Machine learning is opening up a whole new opportunity for us to take 50 million items from two million makers and make sense of that for people. We have 37 million active buyers now and many of them come just for discovery, just to see what they can find, and that is exactly the right thing for someone out there. Our job is to create that love connection. Etsy over the past 14 years, with a large team effort, has I think done a great job.

One thing I want to emphasize is the quality and the craftsmanship with so many of the products on Etsy. That’s something that has been such a delight for me. People like Kringle Workshops that make these incredible products. What we have been doing a better job and need to continue to do a better job of really surfacing the beautiful artisanally crafted products that are available at a really fair price. You’re not having to pay for warehousing, you’re not having to pay for all the other things that mass-produce things have to pay for, you’re buying directly from the person who made it. So it can be both beautiful, handcrafted, and well priced.

There are 2 million sellers, 87 percent of them are women, over 90 percent are working from home or are businesses of one, who can create a global business from their garage or their living room. Etsy does provide a real sense of community for them and that’s really powerful.

Amazon May Open New HQ in Queens Near Etsy

We feel great about our employee value proposition and come what may. Here’s what we have going for us. We think we’ve got the best team, certainly in tech companies on the eastern seaboard. We think ours is the best and we continue to attract great talent. The reason is, first and foremost, our mission is really a meaningful important mission and that matters. Great people want to work in a place with a great mission.

Second, our technology challenges are interesting. For example, search and using machine learning to make sense of 50 million items that don’t map to a catalog. Third, our culture is really special. We have been a company that’s authentically cared about diversity from the beginning. Over 50 percent of our executive staff are women, we have a balanced board, 50 percent male and female, and 32 percent of our engineers are female, which is twice the industry average. People who care about diversity and inclusion really want to come to work at Etsy. All of that is going for us and we’re happy to compete with whoever we need to.

Earnings Call Comments by Etsy CEO:

Active Buyers Grew 17 Percent

Etsy’s growth accelerated again in the third quarter to nearly 21% on a constant-currency basis. Revenue growth exceeded 41%, fueled by the launch of our new pricing structure, and our adjusted EBITDA margins grew to nearly 23%, while we also increased our investments in the business.

Active buyers grew 17% to 37 million worldwide. This is the fourth consecutive quarter that GMS has grown faster than active buyers, evidence that we are seeing increased buyer activity on the platform, which is a key proxy for improvement in frequency. We grew the number of active sellers by 8% and GMS per active seller is also increasing.

Two principal levers contributed to our progress this past quarter. The first is our continued product investment, focused on improving the shopping experience on Etsy. By making it easier to find and buy the great products available for sale on Etsy, we’re doing a better job converting visits into purchases. The second lever was our new pricing structure, which enabled us to ramp up investments in marketing, shipping improvements and customer support.

Successful Cloud Migration

We achieved a significant milestone in our cloud migration this quarter, successfully migrating our marketplace, Etsy.com, and our mobile applications to the Google Cloud with minimal disruption to buyers and sellers. This increases our confidence that the migration will be complete by the end of 2019.

Once fully migrated, we expect to dramatically increase the velocity of experiments and product development to iterate faster and leverage more complex search and machine learning models with the goal of rapidly innovating, improving search and ultimately driving GMS growth.

In fact, we’re beginning to see some of those benefits today based on the systems we’ve already migrated. I’d like to thank our engineering team for their incredible work to get this – get us to this point.

 

The post Etsy CEO: Machine Learning is Opening Up a Whole New Opportunity appeared first on WebProNews.

WebProNews

Posted in Latest NewsComments Off

SearchCap: Facebook food ordering, Google Posts automation & machine learning

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Facebook food ordering, Google Posts automation & machine learning appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in Latest NewsComments Off

Grist for the Machine

Grist

Much like publishers, employees at the big tech monopolies can end up little more than grist.

Products & product categories come & go, but even if you build “the one” you still may lose everything in the process.

Imagine building the most successful consumer product of all time only to realize:’The iPhone is the reason I’m divorced,’ Andy Grignon, a senior iPhone engineer, tells me. I heard that sentiment more than once throughout my dozens of interviews with the iPhone’s key architects and engineers.’Yeah, the iPhone ruined more than a few marriages,’ says another.

Microsoft is laying off thousands of salespeople.

Google colluded with competitors to sign anti-employee agreements & now they are trying to hold down labor costs with modular housing built on leased government property. They can tout innovation they bring to Africa, but at their core the tech monopolies are still largely abusive. What’s telling is that these companies keep using their monopoly profits to buy more real estate near their corporate headquarters, keeping jobs there in spite of the extreme local living costs.

“There’s been essentially no dispersion of tech jobs,’ said Mr. Kolko, who conducted the research.’Which metro is the next Silicon Valley? The answer is none, at least for the foreseeable future. Silicon Valley still stands apart.’

Making $ 180,000 a year can price one out of the local real estate market, requiring living in a van or a two hour commute. An $ 81,000 salary can require a 3 hour commute.

If you are priced out of the market by the monopoly de jour, you can always pray!

The hype surrounding transformative technology that disintermediates geography & other legacy restraints only lasts so long: “The narrative isn’t the product of any single malfunction, but rather the result of overhyped marketing, deficiencies in operating with deep learning and GPUs and intensive data preparation demands.”

AI is often a man standing behind a curtain.

The big tech companies are all about equality, opportunity & innovation. At some point either the jobs move to China or China-like conditions have to move to the job. No benefits, insurance cost passed onto the temp worker, etc.

Google’s outsourced freelance workers have to figure out how to pay for their own health insurance:

A manager named LFEditorCat told the raters in chat that the pay cut had come at the behest of’Big G’s lawyers,’ referring to Google. Later, a rater asked Jackson,’If Google made this change, can Google reverse this change, in theory?’ Jackson replied,’The chances of this changing are less than zero IMO.’

That’s rather unfortunate, as the people who watch the beheading videos will likely need PTSD treatment.

The tech companies are also leveraging many “off the books” employees for last mile programs, where the wage is anything but livable after the cost of fuel, insurance & vehicle maintenance. They are accelerating the worst aspects of consolidated power:

America really is undergoing a radical change in the structure of our political economy. And yet this revolutionary shift of power, control, and wealth has remained all but unrecognized and unstudied … Since the 1990s, large companies have increasingly relied on temporary help to do work that formerly was performed by permanent salaried employees. These arrangements enable firms to hire and fire workers with far greater flexibility and free them from having to provide traditional benefits like unemployment insurance, health insurance, retirement plans, and paid vacations. The workers themselves go by many different names: temps, contingent workers, contractors, freelancers. But while some fit the traditional sense of what it means to be an entrepreneur or independent business owner, many, if not most, do not-precisely because they remain entirely dependent on a single power for their employment.

Dedication & devotion are important traits. Are you willing to do everything you can to go the last mile? “Lyft published a blog post praising a driver who kept picking up fares even after she went into labor and was driving to the hospital to give birth.”

Then again, the health industry is a great driver of consumption:

About 1.8 million workers were out of the labor force for “other” reasons at the beginning of this year, meaning they were not retired, in school, disabled or taking care of a loved one, according to Atlanta Federal Reserve data. Of those people, nearly half — roughly 881,000 workers — said in a survey that they had taken an opioid the day before, according to a study published last year by former White House economist Alan Krueger.”

Creating fake cancer patients is a practical way to make sales.

That is until they stop some of the scams & view those people as no longer worth the economic cost. Those people are only dying off at a rate of about 90 people a day. Long commutes are associated with depression. And enough people are taking anti-depressants that it shows up elsewhere in the food chain.

Rehabilitation is hard work:

After a few years of buildup, Obamacare kicked the scams into high gear. …. With exchange plans largely locked into paying for medically required tests, patients (and their urine) became gold mines. Some labs started offering kickbacks to treatment centers, who in turn began splitting the profits with halfway houses that would tempt clients with free rent and other services. … Street-level patient brokers and phone room lead generators stepped up to fill the beds with strategies across the ethical spectrum, including signing addicts up for Obamacare and paying their premiums.

Google made a lot of money from that scam until it got negative PR coverage.

At the company, we’re family. Once you are done washing the dishes, you can live in the garage. Just make sure you juice!

When platform monopolies dictate the roll-out of technology, there is less and less innovation, fewer places to invest, less to invent. Eventually, the rhetoric of innovation turns into DISRUPT, a quickly canceled show on MSNBC, and Juicero, a Google-backed punchline.

This moment of stagnating innovation and productivity is happening because Silicon Valley has turned its back on its most important political friend: antitrust. Instead, it’s embraced what it should understand as the enemy of innovation: monopoly.

And the snowflake narrative not only relies on the “off the books” marginalized freelance employees to maintain lush benefits for the core employees, but those core employees can easily end up thrown under the bus because accusation is guilt. Uniformity of political ideology is the zenith of a just world.

Celebrate diversity in all aspects of life – except thoughtTM.

Free speech is now considered violence. Free speech has real cost. So if you disagree with someone, “people you might have to work with may simply punch you in the face” – former Google diversity expert Yonatan Zunger.

Anything but the facts!

Mob rule – with a splash of violence – for the win.

Social justice is the antithesis of justice.

It is the aspie guy getting fired for not understanding the full gender “spectrum.”

It is the repression of truth: “Truth equals virtue equals happiness. You cannot solve serious social problems by telling lies or punishing people who tell truth.”

Most meetings at Google are recorded. Anyone at Google can watch it. We’re trying to be really open about everything…except for this. They don’t want any paper trail for any of these things. They were telling us about a lot of these potentially illegal practices that they’ve been doing to try to increase diversity. Basically treating people differently based on what their race or gender are. – James Damore

The recursive feedback loops & reactionary filtering are so bad that some sites promoting socialism are now being dragged to the Google gulag.

In a set of guidelines issued to Google evaluators in March, elaborated in April by Google VP of Engineering Ben Gomes, the company instructed its search evaluators to flag pages returning’conspiracy theories’ or’upsetting’ content unless’the query clearly indicates the user is seeking an alternative viewpoint.’ The changes to the search rankings of WSWS content are consistent with such a mechanism. Users of Google will be able to find the WSWS if they specifically include’World Socialist Web Site’ in their search request. But if their inquiry simply includes term such as’Trotsky,”Trotskyism,”Marxism,”socialism’ or’inequality,’ they will not find the site.

Every website which has a following & challenges power is considered “fake news” or “conspiracy theory” until many years later, when many of the prior “nutjob conspiracies” turn out to be accurate representations of reality.

Under its new so-called anti-fake-news program, Google algorithms have in the past few months moved socialist, anti-war, and progressive websites from previously prominent positions in Google searches to positions up to 50 search result pages from the first page, essentially removing them from the search results any searcher will see. Counterpunch, World Socialsit Website, Democracy Now, American Civil liberties Union, Wikileaks are just a few of the websites which have experienced severe reductions in their returns from Google searches.

In the meantime townhall meetings celebrating diversity will be canceled & differentiated voices will be marginalized to protect the mob from themselves.

What does the above say about tech monopolies wanting to alter the structure of society when their internal ideals are based on fundamental lies? They can’t hold an internal meeting addressing sacred cows because “ultimately the loudest voices on the fringes drive the perception and reaction” but why not let them distribute swarms of animals with bacteria & see what happens? Let’s make Earth a beta.

FANG

Monopoly platforms are only growing more dominant by the day.

Over the past three decades, the U.S. government has permitted corporate giants to take over an ever-increasing share of the economy. Monopoly-the ultimate enemy of free-market competition-now pervades every corner of American life … Economic power, in fact, is more concentrated than ever: According to a study published earlier this year, half of all publicly traded companies have disappeared over the past four decades.

And you don’t have to subscribe to deep state conspiracy theory in order to see the impacts.

The revenue, value & profit transfer is overt:

It is no coincidence that from 2012 to 2016, Amazon, Google and Facebook’s revenues increased by $ 137 billion and the remaining Fortune 497 revenues contracted by $ 97 billion.

Netflix, Amazon, Apple, Google, Facebook … are all aggressively investing in video content as bandwidth is getting cheaper & they need differentiated content to drive subscription revenues. If the big players are bidding competitively to have differentiated video content that puts a bid under some premium content, but for ad-supported content the relatively high CPMs on video content might fall sharply in the years to come.

From a partner perspective, if you only get a percent of revenue that transfers all the risk onto you, how is the new Facebook video feature going to be any better than being a YouTube partner? As video becomes more widespread, won’t that lower CPMs?

No need to guess:

One publisher said its Facebook-monetized videos had an average CPM of 15 cents. A second publisher, which calculated ad rates based on video views that lasted long enough to reach the ad break, said the average CPM for its mid-rolls is 75 cents. A third publisher made roughly $ 500 from more than 20 million total video views on that page in September.

That’s how monopolies work. Whatever is hot at the moment gets pitched as the future, but underneath the hood all compliments get commoditized:

as a result of this increased market power, the big superstar companies have been raising their prices and cutting their wages. This has lifted profits and boosted the stock market, but it has also held down real wages, diverted more of the nation’s income to business owners, and increased inequality. It has also held back productivity, since raising prices restricts economic output.

The future of the web is closed, proprietary silos that mirror what existed before the web:

If in five years I’m just watching NFL-endorsed ESPN clips through a syndication deal with a messaging app, and Vice is just an age-skewed Viacom with better audience data, and I’m looking up the same trivia on Genius instead of Wikipedia, and’publications’ are just content agencies that solve temporary optimization issues for much larger platforms, what will have been point of the last twenty years of creating things for the web?

They’ve all won their respective markets & are now converging:

We’ve been in the celebration phase all year as Microsoft, Google, Amazon, Apple, Netflix and Facebook take their place in the pantheon of classic American monopolists. These firms and a few others, it is now widely acknowledged, dominate everything. There is no day-part in which they do not dominate the battle for consumers’ attention. There is no business safe from their ambitions. There are no industries in which their influence and encroachment are not currently being felt.

The web shifts information-based value chains to universal distribution at zero marginal cost, which shifts most of the value extraction to the attention merchants.

The raw feed stock for these centralized platforms isn’t particularly profitable:

despite a user base near the size of Instagram’s, Tumblr never quite figured out how to make money at the level Facebook has led managers and shareholders to expect … running a platform for culture creation is, increasingly, a charity operation undertaken by larger companies. Servers are expensive, and advertisers would rather just throw money at Facebook than take a chance

Those resting in the shadows of the giants will keep getting crushed: “They let big tech crawl, parse, and resell their IP, catalyzing an extraordinary transfer in wealth from the creators to the platforms.”

They’ll take the influence & margins, but not the responsibility normally associated with such a position:

“Facebook has embraced the healthy gross margins and influence of a media firm but is allergic to the responsibilities of a media firm,” Mr. Galloway says. … For Facebook, a company with more than $ 14 billion in free cash flow in the past year, to say it is adding 250 people to its safety and security efforts is’pissing in the ocean,’ Mr. Galloway says.’They could add 25,000 people, spend $ 1 billion on AI technologies to help those 25,000 employees sort, filter and ID questionable content and advertisers, and their cash flow would decline 10% to 20%.’

It’s why there’s a management shake up at Pandora, Soundcloud laid off 40% of their staff & Vimeo canceled their subscription service before it was even launched.

With the winners of the web determined, it’s time to start locking down the ecosystem with DRM:

Practically speaking, bypassing DRM isn’t hard (Google’s version of DRM was broken for six years before anyone noticed), but that doesn’t matter. Even low-quality DRM gets the copyright owner the extremely profitable right to stop their customers and competitors from using their products except in the ways that the rightsholder specifies. … for a browser to support EME, it must also license a “Content Decryption Module” (CDM). Without a CDM, video just doesn’t work. All the big incumbents advocating for DRM have licenses for CDMs, but new entrants to the market will struggle to get these CDMs, and in order to get them, they have to make promises to restrict otherwise legal activities … We’re dismayed to see the W3C literally overrule the concerns of its public interest members, security experts, accessibility members and innovative startup members, putting the institution’s thumb on the scales for the large incumbents that dominate the web, ensuring that dominance lasts forever.

After years of loosey goosey privacy violations by the tech monopoly players, draconian privacy laws will block new competitors:

More significantly, the GDPR extends the concept of’personal data’ to bring it into line with the online world. The regulation stipulates, for example, that an online identifier, such as a device’s IP address, can now be personal data. So next year, a wide range of identifiers that had hitherto lain outside the law will be regarded as personal data, reflecting changes in technology and the way organisations collect information about people. … Facebook and Google should be OK, because they claim to have the’consent’ of their users. But the data-broking crowd do not have that consent.

GDRP is less than 8 months away.

If you can’t get the fat thumb accidental mobile ad clicks then you need to convert formerly free services to a paid version or sell video ads. Yahoo! shut down most their verticals, was acquired by Verizon, and is now part of Oath. Oath’s strategy is so sound Katie Couric left:

Oath’s video unit, however, had begun doubling down on the type of highly shareable,’snackable’ bites that people gobble up on their smartphones and Facebook feeds. … . What frustrates her like nothing else, two people close to Couric told me, is when she encounters fans and they ask her what she’s up to these days.

When content is atomized into the smallest bits & recycling is encouraged only the central network operators without editorial content costs win.

Even Reddit is pushing crappy autoplay videos for the sake of ads. There’s no chance of it working for them, but they’ll still try, as Google & Facebook have enviable market caps.

Mic laid off journalists and is pivoting to video.

It doesn’t work, but why not try.

The TV networks which focused on the sort of junk short-form video content that is failing online are also seeing low ratings.

Probably just a coincidence.

Some of the “innovative” upstart web publishers are recycling TV ads as video content to run pre-roll ads on. An ad inside an ad.

Some suggest the repackaging and reposting of ads highlights the’pivot to video’ mentality many publishers now demonstrate. The push to churn out video content to feed platforms and to attract potentially lucrative video advertising is increasingly viewed as a potential solution to an increasingly challenging business model problem.

Publishers might also get paid a commission on any sales they help drive by including affiliate links alongside the videos. If these links drive users to purchase the products, then the publisher gets a cut.

Is there any chance recycling low quality infomercial styled ads as placeholder auto-play video content to run prerolls on is a sustainable business practice?

If that counts as strategic thinking in online publishing, count me as a short.

For years whenever the Adobe Flash plugin for Firefox had a security update users who hit the page got a negative option install of Google Chrome as their default web browser. And Google constantly markets Chrome across their properties:

Google is aggressively using its monopoly position in Internet services such as Google Mail, Google Calendar and YouTube to advertise Chrome. Browsers are a mature product and its hard to compete in a mature market if your main competitor has access to billions of dollars worth of free marketing.

It only takes a single yes on any of those billions of ad impressions (or an accidental opt in on the negative option bundling with security updates) for the default web browser to change permanently.

There’s no way Mozilla can compete with Google on economics trying to buy back an audience.

Mozilla is willing to buy influence, too – particularly in mobile, where it’s so weak. One option is paying partners to distribute Firefox on their phones.’We’re going to have to put money toward it,’ Dixon says, but she expects it’ll pay off when Mozilla can share revenue from the resulting search traffic.

They have no chance of winning when they focus on wedge issues like fake news. Much like their mobile operating system, it is a distraction. And the core economics of paying for distribution won’t work either. How can Mozilla get a slice of an advertiser’s ad budget through Yahoo through Bing & compete against Google’s bid?

Google is willing to enter uneconomic deals to keep their monopoly power. Look no further than the $ 1 billion investment they made in AOL which they quickly wrote down by $ 726 million.

Google pays Apple $ 3 billion PER YEAR to be the default search provider in Safari. Verizon acquired Yahoo! for $ 4.48 billion. There’s no chance of Yahoo! outbidding Google for default Safari search placement & if Apple liked the idea they would have bought Yahoo!. It is hard to want to take a big risk & spend billions on something that might not back out when you get paid billions to not take any risk.

Even Microsoft would be taking a big risk in making a competitive bid for the Apple search placement. Microsoft recently disclosed “Search advertising revenue increased $ 124 million or 8%.” If $ 124 million is 8% then their quarterly search ad revenue is $ 1.674 billion. To outbid Google they would have to bid over half their total search revenues.

Regulatory Capture

“I have a foreboding of an America in which my children’s or grandchildren’s time – when the United States is a service and information economy; when nearly all the key manufacturing industries have slipped away to other countries; when awesome technological powers are in the hands of a very few, and no one representing the public interest can even grasp the issues; when the people have lost the ability to set their own agendas or knowledgeably question those in authority; when, clutching our crystals and nervously consulting our horoscopes, our critical faculties in decline, unable to distinguish between what feels good and what’s true, we slide, almost without noticing, back into superstition and darkness. The dumbing down of america is most evident in the slow decay of substantive content in the enormously influential media, the 30-second sound bites (now down to 10 seconds or less), lowest common denominator programming, credulous presentations on pseudoscience and superstition, but especially a kind of celebration of ignorance.” – Carl Sagan, The Demon-haunted World, 1996


The monopoly platforms have remained unscathed by government regulatory efforts in the U.S. Google got so good at lobbying they made Goldman Sachs look like amateurs. It never hurts to place your lawyers in the body that (should) regulate you: “Wright left the FTC in August 2015, returning to George Mason. Just five months later, he had a new position as’of counsel’ at Wilson Sonsini, Google’s primary outside law firm.”

Remember how Google engineers repeatedly announced how people who bought or sold links without clear machine & human readable disclosure are scum? One way to take .edu link building to the next level is to sponsor academic research without disclosure:

Some researchers share their papers before publication and let Google give suggestions, according to thousands of pages of emails obtained by the Journal in public-records requests of more than a dozen university professors. The professors don’t always reveal Google’s backing in their research, and few disclosed the financial ties in subsequent articles on the same or similar topics, the Journal found. … Google officials in Washington compiled wish lists of academic papers that included working titles, abstracts and budgets for each proposed paper-then they searched for willing authors, according to a former employee and a former Google lobbyist. … Mr. Sokol, though, had extensive financial ties to Google, according to his emails obtained by the Journal. He was a part-time attorney at the Silicon Valley law firm of Wilson Sonsini Goodrich & Rosati, which has Google as a client. The 2016 paper’s co-author was also a partner at the law firm, which didn’t respond to requests for comment.

As bad as that is, Google has non profit think tanks fire ENTIRE TEAMS if they suggest regulatory action against Google is just:

“We are in the process of trying to expand our relationship with Google on some absolutely key points,’ Ms. Slaughter wrote in an email to Mr. Lynn, urging him to’just THINK about how you are imperiling funding for others.’

“What happened has little to do with New America, and everything to do with Google and monopoly power. One reason that American governance is dysfunctional is because of the capture of much academic and NGO infrastructure by power. That this happened obviously and clumsily at one think tank is not the point. The point is that this is a *system* of power. I have deep respect for the scholars at New America and the work done there. The point here is how *Google* and monopolies operate. I’ll make one other political point about monopoly power. Democracies all over the world are seeing an upsurge in anger. Why? Scholars have tended to look at political differences, like does a different social safety net have an impact on populism. But it makes more sense to understand what countries have in common. Multi-nationals stretch over… multiple nations. So if you think, we do, that corporations are part of our political system, then populism everywhere monopolies operate isn’t a surprise. Because these are the same monopolies. Google is part of the American political system, and the European one, and so on and so forth.” – Matt Stoller

Any dissent of Google is verboten:

in recent years, Google has become greedy about owning not just search capacities, video and maps, but also the shape of public discourse. As the Wall Street Journal recently reported, Google has recruited and cultivated law professors who support its views. And as the New York Times recently reported, it has become invested in building curriculum for our public schools, and has created political strategy to get schools to adopt its products. This year, Google is on track to spend more money than any company in America on lobbying.

“I just got off the phone with Eric Schmidt and he is pulling all of his money.” – Anne-Marie Slaughter

They not only directly control the think tanks, but also state who & what the think tanks may fund:

Google’s director of policy communications, Bob Boorstin, emailed the Rose Foundation (a major funder of Consumer Watchdog) complaining about Consumer Watchdog and asking the charity to consider “whether there might be better groups in which to place your trust and resources.”

They can also, you know, blackball your media organization or outright penalize you. The more aggressive you are with monetization the more leverage they have to arbitrarily hit you if you don’t play ball.

Six years ago, I was pressured to unpublish a critical piece about Google’s monopolistic practices after the company got upset about it. In my case, the post stayed unpublished. I was working for Forbes at the time, and was new to my job.

Google never challenged the accuracy of the reporting. Instead, a Google spokesperson told me that I needed to unpublish the story because the meeting had been confidential, and the information discussed there had been subject to a non-disclosure agreement between Google and Forbes. (I had signed no such agreement, hadn’t been told the meeting was confidential, and had identified myself as a journalist.)

Sometimes the threat is explicit:

“You’re already asking very difficult questions to Mr. Juncker,’ the YouTube employee said before Birbes’ interview in an exchange she captured on video.’You’re talking about corporate lobbies. You don’t want to get on the wrong side of YouTube and the European Commission… Well, except if you don’t care about having a long career on YouTube.’

Concentrated source of power manipulates the media. Not new, rather typical. Which is precisely why monopolies should be broken up once they have a track record of abusing the public trust:

As more and more of the economy become sown up by monopolistic corporations, there are fewer and fewer opportunities for entrepreneurship. … By design, the private business corporation is geared to pursue its own interests. It’s our job as citizens to structure a political economy that keeps corporations small enough to ensure that their actions never threaten the people’s sovereignty over our nation.

How much control can one entity get before it becomes excessive?

Google controls upwards of 80 percent of global search-and the capital to either acquire or crush any newcomers. They are bringing us a hardly gilded age of prosperity but depressed competition, economic stagnation, and, increasingly, a chilling desire to control the national conversation.

Google thinks their business is too complex to exist in a single organization. They restructured to minimize their legal risks:

The switch is partly related to Google’s transformation from a listed public company into a business owned by a holding company. The change helps keep potential challenges in one business from spreading to another, according to Dana Hobart, a litigator with the Buchalter law firm in Los Angeles.

Isn’t that an admission they should be broken up?

Early Xoogler Doug Edwards wrote: “[Larry Page] wondered how Google could become like a better version of the RIAA – not just a mediator of digital music licensing – but a marketplace for fair distribution of all forms of digitized content.”

A better version of the RIAA as a north star sure seems like an accurate analogy:

In an explosive new allegation, a renowned architect has accused Google of racketeering, saying in a lawsuit the company has a pattern of stealing trade secrets from people it first invites to collaborate. …’It’s cheaper to steal than to develop your own technology,’ Buether said.’You can take it from somebody else and you have a virtually unlimited budget to fight these things in court.’ …’It’s even worse than just using the proprietary information – they actually then claim ownership through patent applications,’ Buether said.

The following slide expresses Google’s views on premium content

No surprise the Content Creators Coalition called for Congressional Investigation into Google’s Distortion of Public Policy Debates:

Google’s efforts to monopolize civil society in support of the company’s balance-sheet-driven agenda is as dangerous as it is wrong. For years, we have watched as Google used its monopoly powers to hurt artists and music creators while profiting off stolen content. For years, we have warned about Google’s actions that stifle the views of anyone who disagrees with its business practices, while claiming to champion free speech.

In a world where monopolies are built with mission statements like’to organize the world’s information and make it universally accessible and useful’ it makes sense to seal court documents, bury regulatory findings, or else the slogan doesn’t fit as the consumer harm was obvious.

“The 160-page critique, which was supposed to remain private but was inadvertently disclosed in an open-records request, concluded that Google’s ‘conduct has resulted – and will result – in real harm to consumers.’ ” But Google was never penalized, because the political appointees overrode the staff recommendation, an action rarely taken by the FTC. The Journal pointed out that Google, whose executives donated more money to the Obama campaign than any company, had held scores of meetings at the White House between the time the staff filed its report and the ultimate decision to drop the enforcement action.

Some scrappy (& perhaps masochistic players) have been fighting the monopoly game for over a decade:

June 2006: Foundem’s Google search penalty begins. Foundem starts an arduous campaign to have the penalty lifted.
September 2007: Foundem is’whitelisted’ for AdWords (i.e. Google manually grants Foundem immunity from its AdWords penalty).
December 2009: Foundem is’whitelisted’ for Google natural search (i.e. Google manually grants Foundem immunity from its search penalty)

For many years Google has “manipulated search results to favor its own comparison-shopping service. … Google both demotes competitors’ offerings in search rankings and artificially inserts its own service in a box above all other search results, regardless of their relevance.”

After losing for over a decade, on the 27th of June a win was finally delivered when the European Commission issued a manual action to negate the spam, when they fined Google €2.42 billion for abusing dominance as search engine by giving illegal advantage to own comparison shopping service.

“What Google has done is illegal under EU antitrust rules. It denied other companies the chance to compete on the merits and to innovate. And most importantly, it denied European consumers a genuine choice of services and the full benefits of innovation.” – Margrethe Vestager

That fine looks to be the first of multiple record-breaking fines as “Sources expect the Android fine to be substantially higher than the shopping penalty.”

That fine was well deserved:

Quoting internal Google documents and emails, the report shows that the company created a list of rival comparison shopping sites that it would artificially lower in the general search results, even though tests showed that Google users’liked the quality of the [rival] sites’ and gave negative feedback on the proposed changes. Google reworked its search algorithm at least four times, the documents show, and altered its established rating criteria before the proposed changes received’slightly positive’ user feedback. … Google’s displayed prices for everyday products, such as watches, anti-wrinkle cream and wireless routers, were roughly 50 percent higher – sometimes more – than those on rival sites. A subsequent study by a consumer protection group found similar results. A study by the Financial Times also documented the higher prices.

Nonetheless, Google is appealing it. The ease with which Google quickly crafted a response was telling.

The competitors who were slaughtered by monopolistic bundling won’t recover‘The damage has been done. The industry is on its knees, and this is not going to put it back,’ said Mr. Stables, who has decided to participate in Google’s new auctions despite misgivings.’I'm sort of shocked that they’ve come out with this,’ he added.

Google claims they’ll be running their EU shopping ads as a separate company with positive profit margins & that advertisers won’t be bidding against themselves if they are on multiple platforms. Anyone who believes that stuff hasn’t dropped a few thousand dollars on a Flash-only website after AdWords turned on Enhanced campaigns against their wishes – charging the advertisers dollars per click to send users to a blank page which would not load.

Hell may freeze over, causing the FTC to look into Google’s Android bundling similarly to how Microsoft’s OS bundling was looked at.

If hell doesn’t freeze over, it is likely because Google further ramped up their lobbying efforts, donating to political organizations they claim to be ideologically opposed to.

The Fight Against Rising (& Declining) Nationalism

As a global corporation above & beyond borders, Google has long been against nationalism. Eric Schmidt’s Hillary Clinton once wrote: “My dream is a hemispheric common market, with open trade and open borders, some time in the future with energy that is as green and sustainable as we can get it, powering growth and opportunity for every person in the hemisphere.”

Apparently Google flacks did not get that memo (or they got the new memo about Eric Schmidt’s Donald Trump), because they were quick to denounce the European Commission’s move as anti-American:

We are writing to express our deep concerns about the European Union’s aggressive and heavy-handed antitrust enforcement action against American companies. It has become increasingly clear that, rather than being grounded in a transparent legal framework, these various investigations and complaints are being driven by politics and protectionist policies that harm open-competition practices, consumers, and unfairly target American companies,.

The above nonsense was in spite of Yelp carrying a heavy load.

Yelp celebrated the victory: “Google has been found guilty of engaging in illegal conduct with the aim of promoting its vertical search services. Although the decision addresses comparison shopping services, the European Commission has also recognized that the same illegal behavior applies to other verticals, including local search.”

The EU is also looking for an expert to monitor Google’s algorithm. It certainly isn’t hard to find areas where the home team wins.

Categories: 

SEO Book

Posted in Latest NewsComments Off

Machine learning for large-scale SEM accounts

Can machine learning be applied to your PPC accounts to make them more efficient? Columnist David Fothergill describes how he utilized machine learning to find new keywords for his campaigns.

The post Machine learning for large-scale SEM accounts appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Find More Articles

Posted in Latest NewsComments Off

SearchCap: Machine learning, content marketing & search rankings

Below is what happened in search today, as reported on Search Engine Land and from other places across the web.

The post SearchCap: Machine learning, content marketing & search rankings appeared first on Search Engine Land.



Please visit Search Engine Land for the full article.


Search Engine Land: News & Info About SEO, PPC, SEM, Search Engines & Search Marketing

Posted in Latest NewsComments Off

The Machine Learning Revolution: How it Works and its Impact on SEO

Posted by EricEnge

Machine learning is already a very big deal. It’s here, and it’s in use in far more businesses than you might suspect. A few months back, I decided to take a deep dive into this topic to learn more about it. In today’s post, I’ll dive into a certain amount of technical detail about how it works, but I also plan to discuss its practical impact on SEO and digital marketing.

For reference, check out Rand Fishkin’s presentation about how we’ve entered into a two-algorithm world. Rand addresses the impact of machine learning on search and SEO in detail in that presentation, and how it influences SEO. I’ll talk more about that again later.

For fun, I’ll also include a tool that allows you to predict your chances of getting a retweet based on a number of things: your Followerwonk Social Authority, whether you include images, hashtags, and several other similar factors. I call this tool the Twitter Engagement Predictor (TEP). To build the TEP, I created and trained a neural network. The tool will accept input from you, and then use the neural network to predict your chances of getting an RT.

The TEP leverages the data from a study I published in December 2014 on Twitter engagement, where we reviewed information from 1.9M original tweets (as opposed to RTs and favorites) to see what factors most improved the chances of getting a retweet.

My machine learning journey

I got my first meaningful glimpse of machine learning back in 2011 when I interviewed Google’s Peter Norvig, and he told me how Google had used it to teach Google Translate.

Basically, they looked at all the language translations they could find across the web and learned from them. This is a very intense and complicated example of machine learning, and Google had deployed it by 2011. Suffice it to say that all the major market players — such as Google, Apple, Microsoft, and Facebook — already leverage machine learning in many interesting ways.

Back in November, when I decided I wanted to learn more about the topic, I started doing a variety of searches of articles to read online. It wasn’t long before I stumbled upon this great course on machine learning on Coursera. It’s taught by Andrew Ng of Stanford University, and it provides an awesome, in-depth look at the basics of machine learning.

Warning: This course is long (19 total sections with an average of more than one hour of video each). It also requires an understanding of calculus to get through the math. In the course, you’ll be immersed in math from start to finish. But the point is this: If you have the math background, and the determination, you can take a free online course to get started with this stuff.

In addition, Ng walks you through many programming examples using a language called Octave. You can then take what you’ve learned and create your own machine learning programs. This is exactly what I have done in the example program included below.

Basic concepts of machine learning

First of all, let me be clear: this process didn’t make me a leading expert on this topic. However, I’ve learned enough to provide you with a serviceable intro to some key concepts. You can break machine learning into two classes: supervised and unsupervised. First, I’ll take a look at supervised machine learning.

Supervised machine learning

At its most basic level, you can think of supervised machine learning as creating a series of equations to fit a known set of data. Let’s say you want an algorithm to predict housing prices (an example that Ng uses frequently in the Coursera classes). You might get some data that looks like this (note that the data is totally made up):

In this example, we have (fictitious) historical data that indicates the price of a house based on its size. As you can see, the price tends to go up as house size goes up, but the data does not fit into a straight line. However, you can calculate a straight line that fits the data pretty well, and that line might look like this:

This line can then be used to predict the pricing for new houses. We treat the size of the house as the “input” to the algorithm and the predicted price as the “output.” For example, if you have a house that is 2600 square feet, the price looks like it would be about $ xxxK ?????? dollars.

However, this model turns out to be a bit simplistic. There are other factors that can play into housing prices, such as the total rooms, number of bedrooms, number of bathrooms, and lot size. Based on this, you could build a slightly more complicated model, with a table of data similar to this one:

Already you can see that a simple straight line will not do, as you’ll have to assign weights to each factor to come up with a housing price prediction. Perhaps the biggest factors are house size and lot size, but rooms, bedrooms, and bathrooms all deserve some weight as well (all of these would be considered new “inputs”).

Even now, we’re still being quite simplistic. Another huge factor in housing prices is location. Pricing in Seattle, WA is different than it is in Galveston, TX. Once you attempt to build this algorithm on a national scale, using location as an additional input, you can see that it starts to become a very complex problem.

You can use machine learning techniques to solve any of these three types of problems. In each of these examples, you’d assemble a large data set of examples, which can be called training examples, and run a set of programs to design an algorithm to fit the data. This allows you to submit new inputs and use the algorithm to predict the output (the price, in this case). Using training examples like this is what’s referred to as “supervised machine learning.”

Classification problems

This a special class of problems where the goal is to predict specific outcomes. For example, imagine we want to predict the chances that a newborn baby will grow to be at least 6 feet tall. You could imagine that inputs might be as follows:

The output of this algorithm might be a 0 if the person was going to shorter than 6 feet tall, or 1 if they were going to be 6 feet or taller. What makes it a classification problem is that you are putting the input items into one specific class or another. For the height prediction problem as I described it, we are not trying to guess the precise height, but a simple over/under 6 feet prediction.

Some examples of more complex classifying problems are handwriting recognition (recognizing characters) and identifying spam email.

Unsupervised machine learning

Unsupervised machine learning is used in situations where you don’t have training examples. Basically, you want to try and determine how to recognize groups of objects with similar properties. For example, you may have data that looks like this:

The algorithm will then attempt to analyze this data and find out how to group them together based on common characteristics. Perhaps in this example, all of the red “x” points in the following chart share similar attributes:

However, the algorithm may have trouble recognizing outlier points, and may group the data more like this:

What the algorithm has done is find natural groupings within the data, but unlike supervised learning, it had to determine the features that define each group. One industry example of unsupervised learning is Google News. For example, look at the following screen shot:

You can see that the main news story is about Iran holding 10 US sailors, but there are also related news stories shown from Reuters and Bloomberg (circled in red). The grouping of these related stories is an unsupervised machine learning problem, where the algorithm learns to group these items together.

Other industry examples of applied machine learning

A great example of a machine learning algo is the Author Extraction algorithm that Moz has built into their Moz Content tool. You can read more about that algorithm here. The referenced article outlines in detail the unique challenges that Moz faced in solving that problem, as well as how they went about solving it.

As for Stone Temple Consulting’s Twitter Engagement Predictor, this is built on a neural network. A sample screen for this program can be seen here:

The program makes a binary prediction as to whether you’ll get a retweet or not, and then provides you with a percentage probability for that prediction being true.

For those who are interested in the gory details, the neural network configuration I used was six input units, fifteen hidden units, and two output units. The algorithm used one million training examples and two hundred training iterations. The training process required just under 45 billion calculations.

One thing that made this exercise interesting is that there are many conflicting data points in the raw data. Here’s an example of what I mean:

What this shows is the data for people with Followerwonk Social Authority between 0 and 9, and a tweet with no images, no URLs, no @mentions of other users, two hashtags, and between zero and 40 characters. We had 1156 examples of such tweets that did not get a retweet, and 17 that did.

The most desirable outcome for the resulting algorithm is to predict that these tweets not get a retweet, so that would make it wrong 1.4% of the time (17 times out of 1173). Note that the resulting neural network assesses the probability of getting a retweet at 2.1%.

I did a calculation to tabulate how many of these cases existed. I found that we had 102,045 individual training examples where it was desirable to make the wrong prediction, or for just slightly over 10% of all our training data. What this means is that the best the neural network will be able to do is make the right prediction just under 90% of the time.

I also ran two other sets of data (470K and 473K samples in size) through the trained network to see the accuracy level of the TEP. I found that it was 81% accurate in its absolute (yes/no) prediction of the chance of getting a retweet. Bearing in mind that those also had approximately 10% of the samples where making the wrong prediction is the right thing to do, that’s not bad! And, of course, that’s why I show the percentage probability of a retweet, rather than a simple yes/no response.

Try the predictor yourself and let me know what you think! (You can discover your Social Authority by heading to Followerwonk and following these quick steps.) Mind you, this was simply an exercise for me to learn how to build out a neural network, so I recognize the limited utility of what the tool does — no need to give me that feedback ;->.

Examples of algorithms Google might have or create

So now that we know a bit more about what machine learning is about, let’s dive into things that Google may be using machine learning for already:

Penguin

One approach to implementing Penguin would be to identify a set of link characteristics that could potentially be an indicator of a bad link, such as these:

  1. External link sitting in a footer
  2. External link in a right side bar
  3. Proximity to text such as “Sponsored” (and/or related phrases)
  4. Proximity to an image with the word “Sponsored” (and/or related phrases) in it
  5. Grouped with other links with low relevance to each other
  6. Rich anchor text not relevant to page content
  7. External link in navigation
  8. Implemented with no user visible indication that it’s a link (i.e. no line under it)
  9. From a bad class of sites (from an article directory, from a country where you don’t do business, etc.)
  10. …and many other factors

Note that any one of these things isn’t necessarily inherently bad for an individual link, but the algorithm might start to flag sites if a significant portion of all of the links pointing to a given site have some combination of these attributes.

What I outlined above would be a supervised machine learning approach where you train the algorithm with known bad and good links (or sites) that have been identified over the years. Once the algo is trained, you would then run other link examples through it to calculate the probability that each one is a bad link. Based on the percentage of links (and/or total PageRank) coming from bad links, you could then make a decision to lower the site’s rankings, or not.

Another approach to this same problem would be to start with a database of known good links and bad links, and then have the algorithm automatically determine the characteristics (or features) of those links. These features would probably include factors that humans may not have considered on their own.

Panda

Now that you’ve seen the Penguin example, this one should be a bit easier to think about. Here are some things that might be features of sites with poor-quality content:

  1. Small number of words on the page compared to competing pages
  2. Low use of synonyms
  3. Overuse of main keyword of the page (from the title tag)
  4. Large blocks of text isolated at the bottom of the page
  5. Lots of links to unrelated pages
  6. Pages with content scraped from other sites
  7. …and many other factors

Once again, you could start with a known set of good sites and bad sites (from a content perspective) and design an algorithm to determine the common characteristics of those sites.

As with the Penguin discussion above, I’m in no way representing that these are all parts of Panda — they’re just meant to illustrate the overall concept of how it might work.

How machine learning impacts SEO

The key to understanding the impact of machine learning on SEO is understanding what Google (and other search engines) want to use it for. A key insight is that there’s a strong correlation between Google providing high-quality search results and the revenue they get from their ads.

Back in 2009, Bing and Google performed some tests that showed how even introducing small delays into their search results significantly impacted user satisfaction. In addition, those results showed that with lower satisfaction came fewer clicks and lower revenues:

The reason behind this is simple. Google has other sources of competition, and this goes well beyond Bing. Texting friends for their input is one form of competition. So are Facebook, Apple/Siri, and Amazon. Alternative sources of information and answers exist for users, and they are working to improve the quality of what they offer every day. So must Google.

I’ve already suggested that machine learning may be a part of Panda and Penguin, and it may well be a part of the “Search Quality” algorithm. And there are likely many more of these types of algorithms to come.

So what does this mean?

Given that higher user satisfaction is of critical importance to Google, it means that content quality and user satisfaction with the content of your pages must now be treated by you as an SEO ranking factor. You’re going to need to measure it, and steadily improve it over time. Some questions to ask yourself include:

  1. Does your page meet the intent of a large percentage of visitors to it? If a user is interested in that product, do they need help in selecting it? Learning how to use it?
  2. What about related intents? If someone comes to your site looking for a specific product, what other related products could they be looking for?
  3. What gaps exist in the content on the page?
  4. Is your page a higher-quality experience than that of your competitors?
  5. What’s your strategy for measuring page performance and improving it over time?

There are many ways that Google can measure how good your page is, and use that to impact rankings. Here are some of them:

  1. When they arrive on your page after clicking on a SERP, how long do they stay? How does that compare to competing pages?
  2. What is the relative rate of CTR on your SERP listing vs. competition?
  3. What volume of brand searches does your business get?
  4. If you have a page for a given product, do you offer thinner or richer content than competing pages?
  5. When users click back to the search results after visiting your page, do they behave like their task was fulfilled? Or do they click on other results or enter followup searches?

For more on how content quality and user satisfaction has become a core SEO factor, please check out the following:

  1. Rand’s presentation on a two-algorithm world
  2. My article on Term Frequency Analysis
  3. My article on Inverse Document Frequency
  4. My article on Content Effectiveness Optimization

Summary

Machine learning is becoming highly prevalent. The barrier to learning basic algorithms is largely gone. All the major players in the tech industry are leveraging it in some manner. Here’s a little bit on what Facebook is doing, and machine learning hiring at Apple. Others are offering platforms to make implementing machine learning easier, such as Microsoft and Amazon.

For people involved in SEO and digital marketing, you can expect that these major players are going to get better and better at leveraging these algorithms to help them meet their goals. That’s why it will be of critical importance to tune your strategies to align with the goals of those organizations.

In the case of SEO, machine learning will steadily increase the importance of content quality and user experience over time. For you, that makes it time to get on board and make these factors a key part of your overall SEO strategy.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


Moz Blog

Posted in Latest NewsComments Off

The Amazing Copywriting Machine

Image of Industrial Machinery

Bill was a struggling copywriter with a big idea.

He’d spend hours and days laboring over headlines, landing pages, ads, and emails for his few clients, but the results of his work were often not worthy of comment.

He’d had enough.

One warm Friday night, he decided he would build a machine — a copywriting machine — that could be fed raw data on one end, and would spit out highly converting copy on the other.

If he got it right, the business world would beat a path to his desk.

Over the next seven months, Bill blew his life savings on the project. Then he secured bank loans and two rounds of VC to hire a world-class mechanic, a software engineer, and to procure the best industrial materials on earth.

He mined the Internet, consumer reports, psychological journals, and the great books of the ages for data. Then he brought in a top mathematician, an analytics expert, and the professor of logic from his alma mater.

The shop was humming. As word of his project got out, clients started sniffing around. Big clients.

Finally, one cold Friday night, Bill and a fellow copywriter — a friend who’d been skeptical of his machine all along — stood in front of the finished behemoth.

“Well, time to give it a go,” Bill said.

He picked up the eight hundred and seventy-seven pages of data on the buying habits of his first client’s product that he and his researchers had gathered … and fed it all into the copywriting machine.

The thing spit, buzzed, convulsed, steamed, and jerked back and forth for twelve minutes before belching a single page out of its back end. Bill, sweating and nervous, picked it up and began to read.

“Well?” his copywriter friend asked.

It was an almost unreadable mass of adjectives, percentages, testimonials, and calls to action, topped with a headline that read like the dosage instructions of a prescription drug.

He tried again. And again. And again. He worked all night tweaking the settings of the great machine, adjusting the data feed, even demanding new data from his crew. It was a vain pursuit, the copy it turned out was unusable, at best.

As the sun rose, Bill was beginning to think that his copywriting machine was a bust.

“I just don’t get it. I’ve got the most accurate data, the best engineering, and the finest algorithm in the business. Something is missing. Something is not …”

His friend walked back into the shop carrying two hot coffees.

“Man, I hate to be the one to remind you,” she said, “but the only amazing copywriting machine in this room is the one right there inside your skull. That’s the only machine that can turn good data into true and immortal stories. Here’s your coffee.”

Bill’s machine burped up one final page. It was utterly blank.

The audience will not tune in to watch information. You wouldn’t, I wouldn’t. No one would or will. The audience will only tune in — and stay tuned in — to watch drama. ~ David Mamet

About the Author: Robert Bruce is VP of Marketing for Copyblogger Media, as well as its Resident Recluse. Get more from him via Twitter or Google+.

Related Stories

Copyblogger

Posted in Latest NewsComments Off

Machine Learning and Link Spam: My Brush With Insanity

Posted by wrttnwrd

sadfishie

Know someone who thinks they’re smart? Tell them to build a machine learning tool. If they majored in, say, History in college, within 30 minutes they’ll be curled up in a ball, rocking back and forth while humming the opening bars of “Oklahoma.”

Sometimes, though, the alternative is rooting through 250,000 web pages by hand, checking them for compliance with Google’s TOS. Doing that will skip you right past the rocking-and-humming stage, and launch you right into writing-with-crayons-between-your-toes phase.

Those were my two choices six months ago. Several companies came to Portent asking for help with Penguin/manual penalties. They all, for one reason or another, had dirty link profiles.

Link analysis, the hard way. Back when I was a kid…

I did the first link profile review by hand, like this:

  1. Download a list of all external linking pages from SEOmoz, MajesticSEO, and Google Webmaster Tools.
  2. Remove obviously bad links by analyzing URLs. Face it: if a linking page is on a domain like “FreeLinksDirectory.com” or “ArticleSuccess.com,” it’s gotta go.
  3. Analyze the domain and page trustrank and trustflow. Throw out anything with a zero, unless it’s on a list of ‘whitelisted’ domains.
  4. Grab thumbnails of each remaining linking page, using Python, Selenium, and Phantomjs. You don’t have to do this step, but it helps if you’re going to get help from other folks.
  5. Get some poor bugger a faithful Portent team member to review the thumbnails, quickly checking off whether they’re forums, blatant link spam, or something else.

After all of that prep work, my final review still took 10+ hours of eye-rotting agony.

There had to be a better way. I knew just enough about machine learning to realize it had possibilities, so I dove in. After all, how hard can it be?

Machine learning: the basic concept

The concept of machine learning isn’t that hard to grasp:

  1. Take a large dataset you need to classify. It could be book titles, people’s names, Facebook posts, or, for me, linking web pages.
  2. Define the categories. In this case, I’m looking for ‘spam’ and ‘good.’
  3. Get a collection of those items and classify them by hand. Or, if you’re really lucky, you find a collection that someone else classified for you. The Natural Language Toolkit, for example, has a movie reviews corpus you can use for sentiment analysis. This is your training set.
  4. Pick the right machine learning tool (hah).
  5. Configure it correctly (hahahahahahaha heee heeeeee sniff haa haaa… sorry, I’m ok… ha ha haaaaaaauuuugh).
  6. Feed in your training set, with the features — the item attributes used for classification — pre-selected. The tool will find patterns, if it can (giggle).
  7. Use the tool to compare each item in your dataset to the training set.
  8. The tool returns a classification of each item, plus its confidence in the classification and, if it’s really cool, the features that were most critical in that classification.

If you ignore the hysterical laughter, the process seems pretty simple. Alas, the laughter is a dead giveaway: these seven steps are easy the same way “Fly to moon, land on moon, fly home” is three easy steps.

Note: At this point, you could go ahead and use a pre-built toolset like BigML, Datameer, or Google’s Prediction API. Or, you could decide to build it all by hand. Which is what I did. You know, because I have so much spare time. If you’re unsure, keep reading. If this story doesn’t make you run, screaming, to the pre-built tools, start coding. You have my blessings.

The ingredients: Python, NLTK, scikit-learn

I sketched out the process for IIS (Is It Spam, not Internet Information Server) like this:

  1. Download a list of all external linking pages from SEOmoz, MajesticSEO, and Google Webmaster Tools.
  2. Use a little Python script to scrape the content of those pages.
  3. Get the SEOmoz and MajesticSEO metrics for each linking page.
  4. Build any additional features I wanted to use. I needed to calculate the reading grade level and links per word, for example. I also needed to pull out all meaningful words, and a count of those words.
  5. Finally, compare each result to my training set.

To do all of this, I needed a programming language, some kind of natural language processing (to figure out meaningful words, clean up HTML, etc.) and a machine learning algorithm that I could connect to the programming language.

I’m already a bit of a Python hacker (not a programmer – my code makes programmers cry), so Python was the obvious choice of programming language.

I’d dabbled a little with the Natural Language Toolkit (NLTK). It’s built for Python, and would easily filter out stop words, clean up HTML, and do all the other stuff I needed.

For my machine learning toolset, I picked a Python library called scikit-learn, mostly because there were tutorials out there that I could actually read.

I smushed it all together using some really-not-pretty Python code, and connected it to a MongoDB database for storage.

A word about the training set

The training set makes or breaks the model. A good training set means your bouncing baby machine learning program has a good teacher. A bad training set means it’s got Edna Krabappel.

And accuracy alone isn’t enough. A training set also has to cover the full range of possible classification scenarios. One ‘good’ and one ‘spam’ page aren’t enough. You need hundreds or thousands to provide a nice range of possibilities. Otherwise, the machine learning program stagger around, unable to classify items outside the narrow training set.

Luckily, our initial hand-review reinclusion method gave us a set of carefully-selected spam and good pages. That was our initial training set. Later on, we dug deeper and grew the training set by running Is It Spam and hand-verifying good and bad page results.

That worked great on Is It Spam 2.0. It didn’t work so well on 1.0.

First attempt: fail

For my first version of the tool, I used a Bayesian Filter as my machine learning tool. I figured, hey, it works for e-mail spam, why not SEO spam?

Apparently, I was already delirious at that point. Bayesian filtering works for e-mail spam about as well as fishing with a baseball bat. It does occasionally catch spam. It also misses a lot of it, dumps legitimate e-mail into spam folders, and generally amuses serious spammers the world over.

But, in my madness, I forgot all about these little problems. Is It Spam 1.0 seemed pretty great at first. Initial tests showed 75% accuracy. That may not sound great, but with accurate confidence data, it could really streamline link profile reviews. I was the proud papa of a baby machine learning tool.

But Bayesian filters can be ‘poisoned.’ If you feed the filter a training set where 90% of the spam pages talk about weddings, it’s possible the tool will begin seeing all wedding-related content as spam. That’s exactly what happened in my case: I fed in 10,000 or so pages of spammy wedding links (we do a lot of work in the wedding industry). On the next test run, Is It Spam decided that anything matrimonial was spam. Accuracy fell to 50%.

Since we tend to use the tool to evaluate sites in specific verticals, this would never work. Every test would likely poison the filter. We could build the training set to millions of pages, but my pointy little head couldn’t contemplate the infrastructure required to handle that.

The real problem with a pure Bayesian approach is that there’s really only one feature: The content of the page. It ignores things like links, page trust and authority.

Oops. Back to the drawing board. I sent my little AI in for counseling, and a new brain.

Note: I wouldn’t have figured this out without help from SEOmoz’s Dr. Pete and Matt Peters. A ‘hat tip’ doesn’t seem like enough, but for now, it’ll have to do.

Second attempt: a qualified success

My second test used logistic regression. This machine learning model uses numeric data, not text. So, I could feed it more features. After the first exercise, this actually wasn’t too horrific. A few hours of work got me a tool that evaluates:

  • Page TrustFlow and CitationFlow (from MajesticSEO – I’m adding SEOmoz metrics now)
  • Links per word
  • Page Flesch-Kincaid reading grade level
  • Page Flesch Kincaid reading ease
  • Words per page
  • Syllables per page
  • Characters per page
  • A few other seemingly-random bits, like images per page, misspellings, and grammar errors

This time, the tool worked a lot better. With vertical-specific training sets, it ran with 85%+ accuracy.

In case you're wondering, this is what victory looks like:

This is what victory looks like

When I tried to use the tool for more general tests, though, my coded kid tripped over its big, adolescent feet. Some of the funnier results:

  • It saw itself as spam.
  • It thought Rand’s blog was a swirling black hole of spammy despair.

False positives remain a big problem if we try to build a training set outside a single vertical.

Disappointing. But the tool chugs along happily within verticals, so we continue using it for that. We build a custom training set for each client, then run the training set against the remaining links. The result is a relatively clear report:

excelreport

Results and next steps

With little IIS learning to walk, we’ve cut the brute-force portion of large link profile evaluations from 30 hours to 3 hours. Not. Too. Shabby.

I tried to launch a public version of Is It Spam, but folks started using it to do real link profile evaluations, without checking their results. That scared the crap out of me, so I took the tool down until we cure the false positives problem.

I think we can address the false positives issue by adding a few features to the classification set:

  1. Bayesian filtering: Instead of depending on a Bayesian classification as 100% of the formula we’ll use the Bayesian score as one more feature.
  2. Grammar scoring: Anyone know a decent grammar testing algorithm in Python? If so, let me know. I’d love to add grammar quality as a feature.
  3. Anchor text matters a lot. The next generation of the tool needs to score the relevant link based on the anchor text. Is it a name (like in a byline)? Or is it a phrase (like in a keyword-stuffed link)?
  4. Link position may matter, too. This is another great feature that could help with spam detection. It might lead to more false positives, though. If Is It Spam sees a large number of spammy links in press release body copy, it may start rating other links located in body copy as spam, too. We’ll test to see if the other features are enough to help with this.

If I'm lucky, one or more of these changes may yield a tool that can evaluate pages across different verticals. If I'm lucky.

Insights

This is by far the most challenging development project I've ever tried. I probably wore another 10 years' enamel off my teeth in just six weeks. But it's been productive:

  1. When you start digging into automated page analysis and machine learning, you learn a lot about how computers evaluate language. That's awfully relevant if you're a 21st Century marketer.
  2. I uncovered an interesting pattern in Google's Penguin implementation. This is based on my fumbling about with machine learning, so take it with a grain of salt, but have a look here.
  3. We learned that there is no such thing as a spammy page. There are only spammy links. One link from a particular page may be totally fine: For example, a brand link from a press release page. Another link from that same page may be spam: For example, a keyword-stuffed link from the same press release.
  4. We've reduced time required for an initial link profile evaluation by a factor of ten.

It's also been a great humility-building exercise.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


SEOmoz Daily SEO Blog

Posted in Latest NewsComments Off

How to Build and Operate a Content Marketing Machine

Posted by Toby Murdock

Content Marketing is hot. White hot. SEO and digital marketing thought leaders are declaring that Content Marketing is the next big thing. Even Rand is touting its importance.

The strategy of Content Marketing makes sense: instead of pushing messages about your product at prospects, pull prospects towards you by publishing content about your prospects’ interests. Search rank, traffic, leads and all sort of goodness flow from this approach.

So the conversation is no longer about if or why an organization should practice Content Marketing. But the still unanswered question is “How?” How does a brand actually become a publisher, produce great content, and attract traffic and generate conversions?

So if you’re wondering “How?”, fear not. This post will provide a guide on how to build and operate a Content Marketing Machine. But, to be clear, I’m not talking about dipping a toe in the water: doing some blog posts, busting out an infographic. I’m talking about a sustained effort to generate content excellence in your category. I’m talking about a machine that generates more traffic and leads at lower cost than all of your other channels combined.

The Machine

First, let’s take a look at the machine, all of its pistons, cogs, smokestacks and miscellaneous parts. This will give you an overview of what you’re building and what you’re going to operate:

Now we’ll go over the machine, part by part.

Goals & Plan

What is the goal, the end output for your Content Marketing Machine? Content marketing is utilized for lots of objectives, including customer retention, upsell, support and brand awareness. But by far the major objective for most Content Marketers is Lead Generation / Customer Acquisition, which can take the form of adding an item to a shopping cart, filling out a lead-gen form, or signing up for a trial.

Your plan then becomes to create a content-powered path that takes your prospect from where they are today to the end goal. This plan is best plotted on a matrix, called The Content Grid, where one axis lists your customer personas and the other axis lists your various stages in the buying cycle. We can do a close-up on this part of the machine here:

Then for each cell in this grid, you have to ascertain what content can attract the persona to that stage and help move them on to the next stage. Specifically each cell should answer the following questions:

  • What questions does the Persona want to answer at this stage in the process?
  • What are the topics and categories that would provide this content and answer these questions?
  • What are some sample headlines for content in each cell?
  • What formats (blog posts, videos, eBooks, etc.) would this content be delivered through?

Remember, at the top of your buying cycle, the prospect does not care at all about you and your brand. Your content here should be at some intersection between your prospect’s interests and the expertise within your organization. The content here at the top should never promote your own products and services. But as you move down the Content Grid and the prospect has indicated interest in your products and services, your content should provide more information about them.

Team

So you’ve got a plan. Now you have to figure out who is going to execute it. Begin by looking at your grid. Who can produce these pieces of content? Is it going to be internal contributors? External paid freelancers? Guest posters?

Naturally this depends a good amount on your budget. But for most organizations it is a mix of internal and external contributors: you want to utilize your unique internal expertise, but you also use external talents to share the burden, particularly on rich media content like video and infographics.

While there is a variance in the mix for the set of contributors, there is one consistent, crucial role: the Managing Editor. Many stakeholders will submit ideas and content into the Content Marketing Machine, will turn its Audience Development crank, and will pull leads and reports out of the Machine. But you need at least one person whose primary responsibility is to man the controls of the machine: to plan the editorial calendar, to supervise content production and distribution, to generate traffic and conversions, to monitor metrics and to be accountable for results. Without such a person, you aren’t operating a Machine, but rather a small appliance (perhaps a Content Marketing toaster).

Ideally the Managing Editor should have content experience from a journalism, copy writing or PR background. But the Managing Editor should also know the web and the ways of search, social, analytics and link-building. Lastly the Managing Editor should be familiar with marketing and the end objectives of driving traffic and conversions.

Ideas

The Ideas section of the Content Marketing Machine is where marketers most often struggle. In the Content Marketing Institute’s 2012 Content Marketing Research Report, over half cited consistently outputting content as their greatest challenge, which a particular struggle over figuring out what to produce. To truly become a publisher requires consistently producing content 3, 4, 5 times a week. What in the world, marketers lament, am I going to write about every day?

Remember: the bulk of the content that you are going to produce is about your customers’ interests, not about your products. Thus the best way to generate content ideas is to understand what your customers are interested in.

There are two best practices for idea generation. First is online social listening. Dive into the categories you are covering on Twitter, Facebook, LinkedIn, etc. See what topics the communities are interested in. Q&A sites like Quora and Yahoo Answers can identify the specific questions your prospects want answered.

The other best practice is to leverage the ears in your organization. Your colleagues in sales, services, support, etc. are talking with customers every day. Encourage them to listen for nuggets of customer concern and then submit those into the Content Marketing team. To give your colleagues incentive to participate, make sure that their submissions don’t end up in a black box. Instead, if you reject them, let them know. If you accept them and convert the idea into content, keep them informed of the content and how it performs. The best organizations at this even keep a leaderboard to showcase which employees are making the best contribution to the Content Marketing ideas effort.

Production

As you get your idea generation going, you’ll then need to operate the heart of the Content Marketing Machine, the content production. The centerpiece of production is an Editorial Calendar. The calendar should specify who is going to create what piece of content, when they will have it submitted, when you plan on publishing it, and to where you plan on publishing it (your site, YouTube, Slideshare, all of the above, etc.).

The Editorial Calendar should look something like this:
 

In your Editorial Calendar you should also note the Customer Persona and Buying Stage that the content is intended for. As you look over your Calendar, you should be able to visually see whether or not you producing the right content mix to cover the various cells in your Content Grid.

Many organizations can get buried in the logistics of the Production stage. Many stakeholders can be involved, including: the idea generator, the content creator, graphic designers, the Managing Editor, the SEO expert, the social media team, Legal & PR (for approvals), etc. Often too much of the effort goes into coordinating these players instead of creating great content.

If you’re in a moderately sized organization with decent complexity, make sure your map out the process involved to get content out the door. Who will submit the content? Who needs to approve it and at what stage of the process? Who is going to be posting messages to Twitter, Facebook and LinkedIn once the content has been published? Identify the required workflows and have a plan to manage them so that your efforts don’t get consumed by administrative tasks.

Audience Development

So you’re publishing content now! Your machine is up and running! Congratulations!
However, creating the content is just half of your task. The other half needs to be around getting visitors to that content, which is the Audience Development component of the Content Marketing Machine. Audience Development breaks down into 4 major buckets:

  • Influencers
  • Search
  • Paid
  • Syndication

Influencers. Influencers are the most important component of Audience Development. Begin by identifying the influencers in your space: the individuals and organizations in your topic that have lots of visitors to their sites, followers to their Twitter accounts, etc. In other words, these are the places on the web where the prospects who you want to read your content hang out.

Your objective is to win links from these Influencers to your content. Get started by building relationships with these Influencers. Retweet their tweets. Comment on their blogs. Get into a dialog.

Once you’ve gotten on the influencer’s radar, craft content with the end objective–the Influencer link–in mind. Ask yourself: What content would be of enough interest to this Influencer that they would want to share it with their audience? Or try to bring the Influencer into the process from the start: tell them that you are working on a piece of content and would appreciate their feedback or a quote.

Search. Winning these Influencer links is the key to getting referral traffic to your content. It is also the biggest way that you can improve category two in Audience Development: search traffic. Win links from authoritative influencers, and the Search Engines will improve your rank, driving more traffic. Of course you need to be deliberate about this process: identify the search keywords that your personas will search for; target and optimize your content for keyword; and track how your content efforts, keyword by keyword, are effecting your search ranking.

Paid. Despite all of the inbound, organic goodness that Content Marketing centers on, Paid traffic does have a place in the mix. Whether it is SEM, or Facebook ads, or sponsored Tweets, or paid Email newsletter distribution, using paid tactics to drive content part of Content Marketing Machine mechanism. What’s interesting to note, however, is how Content Marketers are using paid to drive traffic to their content pages (i.e. about the prospect’s interests) instead of their product pages (about the marketer’s products). The process of developing a relationship with a prospect built on informative content is so powerful that marketers are taking the more patient but more effective approach of buying traffic to their content.

Syndication. Finally, the content you produce need not be limited to your own properties, whether your site, YouTube account, Slideshare account, etc. The most straightforward way to earn a link from a site where your prospects frequent is to give that site quality content. Syndicating your content earns at least one link to your site through your author bio, but also begins to develop a relationship between you and your prospects before they have ever visited your site. Particularly at the beginning, others sites have a lot more traffic than yours does, so syndicating content there is a great way to get your traffic off the ground.

Measurement & Conversion

OK, now the Machine is running full tilt! You have content being produced, and visitors coming for that content. As the Machine runs, you need to keep an eye on a set of gauges for each part of the machine so that you can learn how it’s running and continue to tune it and optimize performance.

Ideas & Production. Keep an eye on the mix of content you are pushing out the door. Do you have the right distribution across the personas from your Content Grid? Are you hitting the relevant categories?

Audience Development. What Influencers are sending you the most traffic? You should be sure to express your gratitude to these Influencers and link back to them. What types of content are succeeding in generating the most valuable links? You need to double down on that content. What keywords have high search volumes but fail to drive you much traffic? You need to improve your production of content around these keywords to improve your rank. Which paid channels are proving the most cost effective traffic?

Traffic & Conversion. This is the major objective as it gets to our end goal of the conversion. All of your content needs to be assessed for how it is performing in bringing first time visitors to your site, bringing back returning visitors, and moving them down the buying cycle, particularly to the conversion event (e.g. form submission; add to cart; start a trial) that you are looking to track. Score all of your content on these objectives, and look for the trends: which authors are pulling in the most new visitors? which content types (e.g. blog post, eBook, video) are keeping each of my personas coming back? which categories of content are leading to the most conversion events.

Every initial content strategy is a best guess. Only by operating your Machine and monitoring your metrics can you understand what’s working and what’s not working and improve your performance over time.

Building Your Own Machine (versus Renting Someone Else’s)

And indeed, you have to recognize that the results of Content Marketing accrue over time. Traditional marketing tactics, i.e. advertising, involve the Marketer renting the attention of someone else’s audience: the marketer pays the media to be able to put the marketer’s message in front of the media’s audience. Despite the problems of advertising, this renting has immediate effects, because the media already has an audience.

Content Marketing takes longer, particularly because, when you start, you have no audience! But don’t be deterred! Just like the difference between buying and renting a house, with Content Marketing, you are building equity as your build your audience. Over time, your audience becomes an incredible asset: a perpetual source of leads / trials / new customers at extremely low cost relative to traditional marketing (i.e. advertising). There are now many brands who have successfully built and now operate such a Content Marketing Machine (here are 50 examples).

This highest state of Content Marketing nirvana is for your Content Marketing Machine to become self-perpetuating. Typically the machine works with content as the input and audience / leads as the output. But once you’ve become such the authority on your topic, your output, the audience, will begin to supply the inputs, the content (see prior section on Syndication).

SEOmoz has, very deservedly, reached this highest state of Content Marketing nirvana. I, in fact, am an audience member providing the inputs! I hope that these inputs, this content, have been helpful to you as you look to build and operate your own Content Marketing Machine. I’m eager to answer any questions. Please fire away in the comments!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!


SEOmoz Daily SEO Blog

Posted in Latest NewsComments Off

Machine Readable Disclosure

You Must Disclose, or Else…

Matt Cutts has long stated that machine-readable disclosure of paid links is required to be within Google’s guidelines.

The idea behind such Cassandra calls is that the web should be graded based on merit, rather than who has the largest ad budget. The Google founders harped on this in their early research:

we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Google is not the only search engine in town, and they have been less forthcoming with their own behavior than what they demand of others.

Ads as Content

Both SugarRae and I have highlighted how Google’s investment in VigLink is (at best) sketchy given Google’s approach to non-Google affiliate links. And now Google’s “ads as content” program has spread to Youtube, where Google counts ad views as video views. The problem with this is that any external search service has no way to separate out which views were organic & which were driven by paid exposure.

(Google has access to that data since they charge the advertisers for each view, but there is no way for any external party to access that data, or know how Google is using it other than what Google states publicly).

That is the *exact* type of pollution Google claimed would undermine the web. But it is only bad when someone is doing it to Google (rather than the other way around).

Youtube = Wikipedia + Wikipedia + Wikipedia

As independent webmasters it can be annoying seeing Wikipedia rank for everything under the sun, but after Google’s “universal search” push Youtube is far more dominant than Wikipedia. When the Panda update happened Youtube was around 4% of Google’s downstream traffic. Youtube has grown their Google-referred traffic by about 4% a month since Panda, up until last month, in which it grew by 18.3% according to Compete.com. That now puts Youtube at over 5% of Google’s downstream traffic (over 3x as much traffic as Wikipedia gets from Google)!

1 in 20 downstream clicks is landing onto a nepotistic property where Google has blurred the lines between ads and content, making it essentially impossible for competing search services to score relevancy (in addition to making features inaccessible, the data that is accessible is polluted). It is unsurprising that Youtube is a significant anti-trust issue:

Google acquired YouTube—and since then it has put in place a growing number of technical measures to restrict competing search engines from properly accessing it for their search results. Without proper access to YouTube, Bing and other search engines cannot stand with Google on an equal footing in returning search results with links to YouTube videos and that, of course, drives more users away from competitors and to Google.

Google promotes “openness” wherever they are weak, and then they erect proprietary barriers to erode competitive threat wherever they are strong.

Playing Politics

At some point it is hard to operate as a monopoly without being blindingly hypocritical. And this is at the core of why Google’s leading engineers feel the need to write guest articles in Politico & Eric Schmidt is working directly with governments to prevent regulatory action. They understand that if they curry favor they can better limit the damage and have more control of what sacrificial anodes die in the eventual anti-trust proceedings.

Is Google Lying Again?

As a marketer & a publisher you can go bankrupt before governments react to monopolies. Thus you need to decide what risks are worthwhile & what suggestions carry any weight.

Here is the litmus test for “is this piece of information from Google more self-serving garbage” … does Google apply the same principals to itself in markets it is easily winning AND markets it is losing badly?

If their suggestion doesn’t apply to Google across-the-board then you can safely ignore it as more self-serving drivel from a monopolist.

Categories: 

SEO Book.com

Related Articles

Posted in Latest NewsComments Off


Advert